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Introduction 


This book will help you to make sense of OAuth, OpenID Connect, and the many moving parts 


that come together to make authentication and delegated authorization happen. 


You will discover how authentication and authorization requirements changed in past years, and 
how today’s standard protocols evolved and augmented their ancestors to meet those challenges 


- problems and solutions locked in an ever-escalating arm’s race. 


You will learn both the whys and the hows of OAuth2 and OpenID Connect. You will learn what 
parts of the protocol are appropriate to use for each of the classic scenarios and app types 
(Sign-on for traditional web apps, Single Page Apps, calling API from desktop, mobile and web 
apps, and so on). We will examine every exchange and parameter in detail - putting everything 
in context and always striving to see the reasons behind every implementation choice within 


the larger picture. 


After reading this book, you will have a clear understanding of the classic problems in authentication 
and delegated authorization, the modern tools that open protocols offer to solve those problems, 
and a working knowledge of OAuth2 and OpenID Connect. All that will allow you to make informed 


design decisions - and even to know your way through troubleshooting and network traces. 





Chapter 1 - Introduction to Digital Identity 


In this chapter, you will be able to grasp some of the essentials of identity, both in terms of 
concepts and the jargon that we like to use in this context. And you'll have a good feeling of the 
problems, the classic dragons that we want to slay in the identity space, which also happens to 


be the things that AuthO can do for our customers. 


Without further ado, what is the deal with identity? Why is everyone always saying, "Oh, this is 
complicated." Why? Just look at the following picture. It is trivially simple: | have just two bodies 


in here and your basic physics course, it would be one of the easy problems. 


oO. [e 


resource 





Figure 1.1 


| have a resource of some kind, and | have a user — an entity of some kind that wants to access 


that resource in some capacity. It's just two things doing one action. Why is this so complicated? 
Well, for one, there's the fact that this is mission-critical. 


When something goes wrong in this scenario, it goes catastrophically wrong. And so, like every 
mission-critical scenario, of course, it deserves our respect and our attention, and our preparation. 
There is a lot of energy that goes into preventing this catastrophic scenario from coming true. 


But in this specific domain of development, the thing that makes these complex is the Cartesian 
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product of all the factors that come into play to determine what you have to do for having a viable 


solution. Consider the following factors: 


J Resource types: just think of all the types of resources you can have. Just a few years ago, 
if you'd walk in a bank, you'd have a host, they’d have some central database, and that's it. 
Today, conversely, pretty much everything is accessible programmatically. So you have the 
API economy, you have serverless — all those buzzwords actually point to different ways 
of exposing resources and, of course, websites, apps, and all the things that you use in 
your daily life. Whenever you interact with a computer system, there is a kind of resource 
that you have to connect to. And, from the point of view of a developer, implementing that 


connection is actually a lot of work. 


JY Development stacks: there are minor differences between development stacks that translate 
into big differences in the code that you have to write for implementing access to a resource 


and the way in which you interact with it. This is one level of complexity. 


J Identities sources: the other level of complexity is the sheer magnitude of the sources of 


identities that you can use today. 


Think of all the ways in which your own identity gets expressed online. You can be a member 
of a social network, an employee of one company, a citizen of a country. And all of those 
identities somewhat get expressed in a database somewhere, and that somewhere determines 


how you pull this information out. 


You connect to Facebook in a certain way. You connect to Active Directory in a different way. 
You get recognized when you're paying your taxes to your country in yet another way. So, 
again, we encounter another factor of complexity: if you want to extract identity from these 
repositories, you have to find a way of doing it according to each repository’s requirements 


and characteristics. 
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¥ Client types: Finally, there are many more complexity factors, but | just want to mention 
another one: the incredible richness with which we can consume information today. Think of 
all the possible clients that you can use from your mobile phone and applications to websites, 
to your watch. You can literally use anything you want to access the data. And again, these 
compounds in terms of complexity with the kinds of resources that you wanted to access, 
the places from where you are extracting information. So, this picture might look simple, 


but it's all but. 


Now, what can AuthO do for you to make this a bit more manageable? We offer many different 
things but, in particular, the most salient component of our offering is our service. It is a service 
that you can use for outsourcing most of the authentication functions that you need to have in 


your solutions - so that you don't have to be exposed to that complexity. In particular, we offer: 


J ways of abstracting away the details of how you connect to multiple sources of identities. 
Every identity provider will have a different style of doing the identity transactions, and we 


abstract all of that away from you. 


J a way of dealing with the user-management lifecycle. We have user representations and 


features for dealing with the lifecycle of users and similar. 


J avery large number of SDKs and samples, which help you to cross the last mile so that when 
you're using a particular development stack, you can actually use components to connect 


to AuthO in a way that is aligned with the idiom that you're using in that context. 


J adegree of customization ability that is absolutely unprecedented in the industry. There is 
no other service at this point that offers the same freedom you have with AuthO to customize 


your experience. 
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Now, when you need to connect your application to AuthO, you need to do something to tell 
us, "AuthO, please do authentication". And that something in AuthO is implemented using open 


standards. 


Open standards are agreements, wide consensus agreements that have been crafted by 
consortiums of different actors in the industry. We identity professionals decided to work on 
open standards when we came to the realization that everyone - users, customers, and vendors - 
would have been better off if we would have enshrined in common standards common messages, 
common protocols, some of the transactions that we know needed to occur when you're doing 
authentication, and similar. What happened back then is that we went to semi-expensive hotels 
around the world, met with our peers across the industry, and argued about how applications should 
present themselves when offering services in the context of an identity transaction. We discussed 
similar considerations for identity providers. What kind of messages should be exchanged? We 
literally argued message details down to the semicolon. That's how fun standards authoring is, 
but it's all worth it: now that we have open standards and all vendors implemented the open 
standards, you, as the customer, can choose which vendor you want to use without worries about 
being locked into a particular technology or vendor. Above all, you can plan to introduce different 


technologies afterward, without worrying about incompatibilities. 


Of course, this is mostly theory: a bit like those simplified school problems disregarding friction 
or gravity of the moon influencing tides. In reality, there are always little details that you need to 
iron out. But largely, if you worked in our industry for the last couple of decades, you know that 


we are so much better off now that we have those open standards we can rely on. 


In identity management, you're going to get in touch with many protocols, many of them probably 


not even invented yet. 
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The ones that are a daily occurrence nowadays are: 
J OpenID Connect, which is used for signing in 


JY OAuth2, which is the basis of OpenID Connect and it is a delegation protocol designed to help 


you access third party APIs 


J JSON Web Token or JWT, which is a standard token format. Most of the tokens you'll be 


working with are in this format 


JY SAML, which is somewhat a legacy (but still very much alive) protocol that is used for doing 
single sign-on across domains for browsers. SAML also defines a standard token format, 


which has been very popular in the past and is still very much in use today. 


From User Passwords in Every App... 


Let’s spend the next few minutes going through a time-lapse-accelerated-whirlwind tour of how 
authentication technologies evolved. My hope is that by going back to basics and revisiting this 
somewhat simplified timeline, I'll have the opportunity to show you why things are the way they are 
today. In doing so, I'll also have the opportunity to introduce the right terms at the right time. By 
being exposed to new terminology at the correct time, that is to say, when a given term first arose, 
you will understand what the corresponding concepts mean in the most general terms. Contrast 
that with the narrower interpretations of a term’s meaning you'd end up with if you’d be exposed 
to it only in the context of solving a specific problem. You might end up thinking that the problem 
you are solving at the moment is the only thing the concept is good for, missing the big picture and 


potentially stumbling in all sorts of future misunderstandings. We won't let that happen! 
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Let's go back to the absolute basics and think about the scenario that | described earlier in Figure 
1.1 - the scenario in which | have one resource of some kind, let's say, a web application and a user, 


and we want to connect the two. Now, what is identity in this context? 


We won't get bogged down with philosophy and similar. Identity here can be defined in a very 
operational, very precise fashion. We call digital identity the set of attributes that define a particular 
user in the context of a function which is delivered by a particular application. What does it mean? 
That means that if | am a bookseller, the relevant information | need about a user is largely their 
credit card number, their shipping address, and the last ten books that the user bought. That's their 
digital identity in that context. If 1am the tax department, then the digital identity of a user is again, 
a physical address, an identifier (here in the USA is the Social Security number), and any other 
information which is relevant to the motion of extracting money from the citizen. If | am a service 
that does DNA sequencing, the identity of my user is the username that they use for signing in, 


their email address for notifications, and potentially their entire genome. 


You can see how for all the various functionalities that we want to achieve, we actually have a 
completely or nearly completely different set of identities. These might correspond to the same 
physical person or not. It doesn't matter. From the point of view of designing Our systems, that's 
what the digital identity is. So, you could say that the digital identity of this user is this set of 
attributes we can place in the application’s store. Now the problem of identity becomes: when 
do | bring those particular attributes in context? The oldest trick in the world is to have the 
resource and the user agree on something such as a shared secret of some sort. So, when the 
user comes back to the site and presents that secret, demonstrates knowledge of that secret, 
the website will say, okay, | Know who you are, you’re the same user | saw yesterday. Here is 
your set of attributes, welcome back. | authenticated the user. In summary, that means grabbing 
a set of credentials, sending it over, and assuming that those credentials were saved previously 


in a database. If they match, the user is authenticated. 
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This scenario is summarized in the following picture: 


Figure 1.2 
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Now, you hear a lot of bad things about username and password... and they are all true. That's 
unfortunate, but it's true. However, it is an extraordinarily simple schema, and as such, it is very, 
very, very resilient. Even if we have more advanced technologies, which do more or less the 
same job, passwords are still very popular. | predict that this year, like every year, someone will 
say that this is the year in which passwords will die. But | think that passwords will still be around 
for some time. My favorite metaphor for this is what happens in the natural world. Humans are 
allegedly the pinnacle of evolution. However, there are still plenty of jellyfish in the sea. They are 
so simple, and sure, we are more advanced, but | am ready to bet that there are more individual 
jellyfish than there are humans. The fact that their body plan is simple doesn't mean that it is 
not successful. You'll see, as we go through this history, that passwords are somewhat building 
blocks on which more advanced protocols layer on top of. Again, I'm not discounting the efforts 
of eliminating passwords and using something better, but I'm just trying to set expectations that 


it's still going to take some time. 





... to Directories 


Let's make things a bit more interesting. Imagine the scenario in which we have one user and 
one application. Now, extend this scenario to the situation in which this user is an employee of 
some company. There is a collection of applications being used by this particular user in the 
context of the company’s business. Most applications are all part of what the user does in the 
context of his or her employment. Imagine that one application is for expense notes, the other is 
for accounting, the other is for warehouse management. Anything you can think of. A few years 
ago, what happened was that we had a bunch of apps on a computer. Then, we had someone 
showing up with a coaxial cable, installing token ring networks, and placing all these computers 
in the network. But that alone didn't make the environment, and in particular the applications, 
automatically network ready. What happened is that you'd have exactly the situation - the big 
thing here - in which you'd have a user accessing different independent apps which knew nothing 
about each other, and which replicated all the functionality that could have been easily centralized. 
In particular, every user had different usernames and passwords - or | should say different 
usernames, because, of course, people reuse their passwords. Every time users went to a new 
app, they had to enter their credentials. And whenever a user had to leave the company, willingly 
or not, the administrator had to go in pilgrimage, on all these various apps, run after the user’s 
entries in there and deprovision them by hand, which of course is a tedious and error-prone flow. 
It's difficult. You often hear horror stories of disgruntled employees using procurement systems 
for buying large amounts of items just for getting back at their former bosses and being able to 


do so because their credentials in the procurement system weren't timely revoked. 


That wasn't a great situation, to say the least. 





What happened is that the industry responded by introducing a new entity, which we call the 
directory. The directory is still extremely popular. It is a software component, a service, which 


centralizes a lot of the functionalities that you see in Figure 1.3. 


ial credentials user credentials Ss a | 
attributes \ : 
i \ Sy 
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Figure 1.3 


Basically, the directory centralized credentials and attributes and made it redundant for applications 
to implement their own identity management logic. At this point, users would simply sign in with 
their own central directory, and from that moment onward, they'd have Single Sign-On access 
to all the other applications. The application developers didn't actually have to code anything 
for identity to achieve that result. In fact, now that the network infrastructure itself provided the 
identity information, administrators could now take advantage of this centralized place to deal 
with the user lifecycle. It can be said that the introduction of the directory is what truly created 
identity administrators as a category of professionals. The ubiquitous availability of directories 
created an ecosystem of tooling that helps people to run operations, identities, and similar. So, 
a fantastic improvement - which was predicated on the perimeter. In order for all this to work as 


intended, you had to have all the actors within that perimeter. The perimeter was often the office 
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building itself, with users actually walking in the building, sitting in front of a particular physical 
device, and having direct “line of sight” with this cathedral in the center of the enterprise: the 


directory, a central place Knowing everything about everyone. 


Cross-Domain SSO 


Of course, we know from current business practices that this approach doesn't scale. It works 
well when you are within one company, but there are so many business processes that require 


having more than one company. 


Think of a classic supplier or reseller. Any of those relationships requires spanning multiple 
organizations. And so what happens is that when you have a user in one organization that needs 
to access a different resource in a different organization, you have a problem. In fact, this user 


does not exist in the resource side directory. 


The first way in which the industry tried to give a solution to this problem was to introduce what 
we Call shadow accounts, which means provisioning the user to the resource side directory. 
This is completely unsustainable, as it presents the same problems that we mentioned earlier at 
a different scale when every application handled identity explicitly. Let's say that we have a user 
whose lifecycle is managed in one place, their own home directory, but that has been provisioned 
an entry in the resource side directory as well. When the user is deprovisioned from their home 
directory, then there might be a trail of user accounts provisioned in other directories (Such as 
the resource side directory in our scenario) that are still around and that need to be manually 
deprovisioned. That's, of course, a big problem because the deprovisioning isn’t likely to happen 


timely or, like any changes in general, are harder to reflect in distributed systems not centrally 
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managed. Plus, imagine the complexity of having this company, which may be a reseller for many 
other companies, but needs to duplicate somewhat the work that its customer companies are 


already doing in their own directories for managing their own users. It's just not sustainable. 


So, what happened was that, just like it's classic in computer science, we solved this problem by 
adding a level of abstraction. We took the capabilities that we have seen for the local directory 
case, and we just abstracted it away. We provided the same transactions, but we described 
them in a way that is not dependent on network infrastructure. For example, Active Directory 
and directories in general, rely on an authentication protocol called Kerberos, which is very much 
integrated with a network layer, hence has specific network hardware requirements. Whereas, of 
course, in this case of scenarios spanning multiple companies, we have to cross the chasm of the 
public Internet and cannot afford to impose any requirements as requests will traverse unknown 


network hardware. 


What happened is that the big guys of that time, Sun, IBM and similar, sat at one table and came 
up with this protocol called SAML, which stands for Security Assertion Markup Language. In a 
nutshell, the protocol described a transaction in which a user can sign in in one place and then 
show proof of signing in in another place and gain access. Here's how it works. We need something 
which facades my actual resource with some software which is capable of talking with that 
protocol, which in this particular case is going to be what we call a middleware: a component that 
stands between your application and the caller, intercepting traffic and executing logic before 
the requests reach the actual application. Similar protocol capabilities would be exposed on the 
identity provider side. In the topology shown in figure 1.3, we have the machine already fulfilling 
the local directory duties (what we call the domain controller in the directory jargon), and we just 
teach that machine to speak a different language, SAML, which can be considered somewhat of 


a trading language that we can use for communication outside the company’s perimeter. 
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In order to close this transaction, what happens is that we need to introduce another concept: 
trust. Think of the scenario we were describing earlier, the one within one single directory: in it, 
every application and every user implicitly believes and trusts the domain controller. The network 
software in itself, whenever you need to authenticate, will send you back to the domain controller 
and the domain controller will do its authentication. It is just implicit, it's as natural as the air that 
you're breathing because there is only one place that can perform authentication duties in the 


entire network. 


Now, look at this particular scenario: 


= 


web app 





: web app 7 


\ 


Company 2 





Company 1 


Figure 1.4 


The application within the Company 2 perimeter can be accessed by any of its business partners: 
there is now a choice about from where we want to get users identities, there is no longer an 
obvious default users’ source. We say that a resource trusts an identity provider or an authority 
when that resource is willing to believe what the authority says about its users. If the authority 
says: “this user is one of my users and successfully authenticated five minutes ago”, then the 


resource will believe it. That's all trust means. 
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When you set up your middleware in front of your application, you typically configure it with the 
coordinates of the identity providers that you trust. How does that come into play when you 
actually make a transaction? Let's see how this works in an actual flow by describing in detail 


each numbered step shown in the following figure: 














web app 

















Company 2 


Of: 


browser 





Company 1 
Figure 1.5 


In the first leg of the diagram, the user points the browser to the application and attempts to 
GETa page (1). The middleware in front of the application intercepts the request, sees that the 
user is not authenticated, and turns the request into an authentication request to the identity 


provider (IdP), as it is configured as one of the trusted IdPs (2). 


In concrete terms, the middleware will craft some kind of message, probably a URL with specific 
query string parameters, and will redirect the browser against one particular endpoint associated 


with the identity provider (3). 
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In this particular scenario, the target endpoint belongs to a local identity provider. You can see that 
the call to the IdP authentication endpoint is occurring within the boundaries of the enterprise. 
That means that that call will be authenticated using Kerberos, like any other call on the local 
network. You can already see these layering of protocols, one on top of the other. Thanks to the 
use of Kerberos and the fact that the user is already authenticated with the local directory, the 


user will not have to enter any credentials during this call. 


Next, the identity provider establishes that the user is already correctly authenticated, and 
establishes that the resource is one of the resources that have been recorded and approved. 
Because of those positive checks, the IdP issues to the user what we call a security token (4). 
A security token is an artifact, a bunch of bits, which is used to carry a tangible proof that the 
user successfully authenticated. Security tokens are digitally signed. What does it mean? A 
digital signature is something that protects bits from tampering. Let's say that someone modifies 
anything of those bits in transit: when the intended recipient tries to check the signature, it will 
find that the signature does not compute. The recipient will know for sure that those bits have 


been modified in transit. 


This property is useful for two reasons. One reason is that given that we use public-key 
cryptography, we expect that the private key that was used to perform the signature is only 
accessible by the intended origin of this token. No one else in the universe can perform with 
that signature, but that particular party. Remember what we just said about trust: that property 
can be used as proof that a token is coming from a specific entity, and in particular, whether it 


is a trusted one. 


The second reason is that given that the token content cannot be modified in transit without 


breaking the signature, | can use tokens as a mechanism to provide the digital identity of a user 
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on the fly. Instead of having to negotiate in advance the acquisition of the attributes that define 
the user (the user identity, according to our definition), as an application, | can just receive those 
attributes just in time, together with the token. This might be the first and the last time that this 
particular user accesses this application, but thanks to the fact that there is a trust between the 


two organizations, | didn't need to do any pre-provisioning steps. 


The attributes that travel inside tokens are called claims. A claim is simply an attribute packaged 
in a context that allows the recipient to decide whether to believe that the user does indeed 
possess that attribute. Think about what happens when boarding a plane. If | present my passport 
to the gate agents, they will be able to compare my name (as asserted by the passport) with the 
name printed on my boarding pass and decide to let me go through. The gate agents will reach 
that conclusion because they trust the government, the entity that issued my passport. If I’d pull 
out a Post-it with my name jolted down with my scrawny chicken legs handwriting and present it 
to the gate agents in lieu of the passport, I'm probably not going to board the plane - in fact, I'm 
likely going to be in trouble. The medium truly is the message in this case. The token really does 
carry this potential for deciding whether you trust or not that particular information. Attributes 


inside tokens become claims. It is an important difference. 


Once the identity provider issues a SAML token, it typically returns it to the browser inside an 
HTML form, together with some JavaScript that triggers as soon as the page is loaded - POSTing 


the token to the application, where it will be intercepted by the middleware (5). 


The middleware looks at the token, establishes whether it's coming from a trusted source, 
establishes whether the signature hasn't been broken, etc. etc. and if it's happy with all that, it 
emits what we call a session cookie (6). The session cookie represents the fact that successful 


authentication occurred. By setting a cookie to represent the session, the application will be 
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spared from having to do the token dance again for every subsequent request. The session 
cookie is simply used for enabling the application to consider the user authenticated every time 


the application receives a postback. 


This is how SAML solved the particular problem of cross-domain single sign-on. We'll see that 


this pattern of exchanging a token for a cookie will also occur with OpenID Connect. 


The Password Sharing Anti-Pattern 


All this happened in the business world, but the consumer world also didn't stay still from the 
identity perspective. One thing that happened was that, as we got more and more of our lives 
online, we found ourselves more and more often with the need to access resources that we 


handle in a certain application... from a different application. 


Let me make a very concrete example. | guess that many of you have LinkedIn, and many of you 
also have Gmail. Imagine the following scenario. Say that a user is currently already signed in 
in LinkedIn, in whatever way they want. The mechanics of how they got signed in in LinkedIn is 
not the point in this scenario. Say that LinkedIn wants to suggest you to invite all of your Gmail 


contacts to become part of your LinkedIn network. 


We are using LinkedIn and GMail only because they are familiar names with familiar use cases, 
but we are in no way implying that they are really implemented in this way nor that they played 


any direct role in authoring this course. 


Now, how was LinkedIn used to do this? I'm using LinkedIn as an example here, but it's basically 
the behavior of any similar service you can think of before the rise of delegated authorization. 


Let’s take a look at this flow by following the steps in the following figure. 
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Figure 1.6 


LinkedIn would actually ask you for your Gmail username and password, which are normally 
stored and validated by Gmail (1). You provide LinkedIn with your Gmail credentials (2), and 
then, LinkedIn would use them to actually access the Gmail APIs used by the Gmail app itself for 
programmatic access to its own service (3). This would achieve what LinkedIn wants, which is 


to call the APIs in Gmail for listing your contacts (4) and sending emails on your behalf. 


What is the problem with this scenario? Many problems, but two, in particular, are impossible to 


ignore. 


The first problem is that granting access to your credentials on any entity that is not the custodian 
of those credentials is always a bad idea. That is mostly because those different entities will not 
have as much skin in the game as the entity that is actually the original place for those credentials. 
If LinkedIn does not apply due diligence and save those credentials in an insecure place... sure, 
they'd get bad PR, but it will not be the catastrophe that it would be for Gmail, for which the user 
access is now impacted. For example, Gmail users will need to change passwords, creating a 
situation where they are highly likely to defect or at least to experience lower satisfaction with 


the service. 
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Here's the second bad thing. Although the intent that LinkedIn had with this transaction was good 
(it is mutually beneficial both for me as a user and for LinkedIn as a service for me to expand my 
network), the way in which they have implemented the function gives them way too much power. 
LinkedIn can actually use this username and password to do whatever they want with my Gmail. 
They can read my emails, they can delete emails selectively, they can send other emails, they can 


do everything they want beyond the scenario originally intended - and that's clearly not good. 


Delegated Authorization: OAuth2 


In response to the challenges outlined at the end of the preceding section, the industry came up 


with a way of working around the problem of giving too much power to applications. 


OAuth2 was designed precisely to implement the delegated access scenario described earlier, 
but without the bad properties we identified as part of the brute force approach. The defining 
feature of the OAuth2 approach lies in the introduction of a new entity, the authorization server, 
which explicitly handles operations related to delegated authorization. | won't go too much into 


the details right now, because I'm going to bore you to death about it later on in this book. 
Suffice to say here that the authorization server has two endpoints: 


JZ The authorization endpoint, designed to deal with the interaction with the end-user. 
It's designed to allow the user to express whether they want a certain service to access 
their resources in a certain fashion. The authorization endpoint handles the interactive 


components of the delegated authorization transaction. 





1 The first incarnation of OAuth was OAuth1, a protocol that resolved the delegated access scenario but had several limitations and 
complications. The industry quickly came up with an evolution, named OAuth2, which solved those problems and completely sup- 


planted OAuth’ for all intents and purposes. For that reason, in this text we only discuss OAuth2. 
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¥ The token endpoint, which is designed to deal with software to software communication and 
takes care of actually executing on the intent that the user expressed in terms of permission, 


consent, delegation, and similar concepts. More details later on. 


Please note: in the following discussion, we are assuming that the user is already signed in 
Linkedin even before the described scenario plays out. We don’t care how the sign-in occurred 
in this context; we just assume it did. OAuth2, as you will hear over and over again, is not a 


sign-in protocol. 


Let’s say that that, as part of his or her Linkedin session, the user gets to a point in which LinkedIn 
wants to gain access to Gmail API on his or her behalf, as described in the last section for the 


analogous scenario. 


In the OAuth2 approach, that means that LinkedIn will cause the user to go to Gmail and grant 
permission to LinkedIn to see their contacts and send mail on their behalf. Let’s follow this new 


flow by taking a look at this figure: 
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Figure 1.7 


LinkedIn follows the OAuth2 specification to craft an authorization request and redirect the user’s 


browser to GMail’s authorization server and, in particular, the authorization endpoint (1). 


The authorization endpoint is used by Gmail to prompt the user (2) for credentials if they are 
not currently authenticated with the GMail web application. This is all within the natural order 
of things. In fact, it's Gmail asking a Gmail user for Gmail credentials. So, no foul playing here, 
everything is fine. As soon as the user is authenticated, the Gmail authorization server will prompt 
the end-user, saying something along the lines of, "Hey, | have this known client, LinkedIn, that 
needs to access my own APIs using your privileges. In particular, they want to see your contacts, 


and they want to send emails on your behalf. Are you okay with it?" 


Once the user says okay, presumably, the authorization server emits an authorization code (3). 
An authorization code is just an opaque string that constitutes a reminder for the authorization 


server of the fact that the user did grant consent for those permissions for that particular client. 
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The authorization code is returned to LinkedIn via browser (4). From now on, the rest of the 


transaction occurs on the server side. 


Please note: before any of the described transactions could occur, LinkedIn had to go to the 
authorization server and register itself as a known client. As part of the client registration operation, 
LinkedIn received an identifier (called client id) and, most importantly, a client secret. The client 
id and client secret will be used for proving LinkedIn’s identity as an application in requests sent 
to GMail’s authorization server, in particular to its token endpoint. The remainder of the diagram 


explanation will give you an example of how this occurs. 


Now that it obtained an authorization code, Linkedin will reach out to the token endpoint of the 
authorization server (5) and will present with its own credentials (client id and client secret) and 
the authorization code, substantially saying, "Hey, this user consented for this and I'm LinkedIn. 


Can | please get access to the resource | want?" 


As an outcome of this, the authorization server will emit a new kind of token, which we call an 
access token (6). The access token is an artifact that is used to grant to LinkedIn the ability to 
access the Gmail APIs (7) on the user's behalf, only within the scope of the permissions that the 


user consented to (8). 


This solves the excessive permissions problem described in The Password Sharing Anti-Pattern 
section. In fact, as long as LinkedIn accesses the Gmail APIs only attempting operations the user 
consented to, the requests to the API will succeed. As soon as LinkedIn tries to do something 
different from the consented operations, like, for example, deleting emails, the endpoint will deny 
LinkedIn access, because the access token accompanying the API call is scoped down to the 
permissions the user consented to (in our example, read contacts and send emails). Scope is 


the keyword that we use here to represent the permissions a client requested on behalf of the 
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user. This mechanism effectively solved the problem of excessive permissions, providing a way 


to express and enforce delegated authorization. 


What we described so far is the canonical OAuth2 use case, the one for which the protocol has 
been originally designed. In practice, however, OAuth2 is used all over the place, and it incurs in 
all sorts of abuses, that is, in ways in which OAuth2 wasn't designed to be used. Be on the lookout 
for those problematic scenarios: every time you hear that some solution uses OAuth2, please 
think of the canonical use case as described here first. OAuth2 supports many other scenarios, 
and in this book, we will discuss most of them. However, the core intent is as expressed in the use 
case we described in this section. Thinking about whether a solution is using OAuth2 in line with 
the intent expressed here, or delve from it significantly, is a useful mental tool to verify whether 


you are dealing with a canonical scenario or if you need to brace for non-standard approaches. 


Layering Sign In on Top of OAuth2: OpenID Connect 


Let me give you a demonstration of one particularly common type of OAuth2 abuse. As OAuth2 
and delegated authorization scenarios started gaining traction, many application developers 
decided that they wanted to do more than just calling APIs. They wanted it to achieve in the 
consumer space, what we achieved with SAML. They wanted to allow users to sign in in their apps 
reusing accounts living in a completely different system. Instantiating this new requirement in the 
scenario we've been discussing, LinkedIn might like users with a Gmail account to be able to use 
it to sign in in LinkedIn directly, without the need to create a LinkedIn account. In other words, 


LinkedIn would just want users to be able to sign up in LinkedIn reusing their Gmail accounts. 





This is a sound proposition because, in many cases, people typically aren't crazy about creating 
new accounts, new passwords, and similar. So, making it possible to reuse accounts is not a bad 


idea in itself 


However, OAuth2 was not designed to implement sign-in operations. Most providers only exposed 
OAuth2 as a way of supporting delegated authorization for their API, and did not expose any 
proper sign-in mechanism as it wasn’t the scenario they were after. That didn’t deter application 
developers, who simply piggybacked on OAuth2 flows to achieve some kind of poor man's signing 
in. Imagine the delegated authorization scenario described for the canonical OAuth2 flow and 
imagine it taking place with the user not being previously signed in in LinkedIn. The following 


picture describes this flow: 
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LinkedIn can perform the dance to gain access to Gmail APIs without having any authenticated 
user signed in yet (1). As soon as LinkedIn successfully accesses Gmail APIs (2), it might reason, 


“Okay, this proves that the person interacting with my app has a legitimate account in Gmail”, 
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so LinkedIn might be satisfied by that and consider this user authenticated - which in practice 
could be implemented by creating and saving a session cookie (3), as we did during sign-in 


flows early on when we discussed the SAML approach 


This would be a good time to remind you that we are using LinkedIn and GMail only 
because they are familiar names with familiar use cases, but we are in no way implying 


that they are really implemented in this way. 


This pattern for implementing sign-in is still a common practice today. A lot of people do this. 
It's usually not a good idea, mainly because access tokens are opaque to the clients requesting 
them, which makes many important details impossible to verify. For example, the fact that an 
access token can be used for successfully calling an API doesn't really say anything about 
whether that access token was issued for your client or for some other application. Someone 
could have legitimately obtained that access token via another application (in our scenario 
not as LinkedIn, but as some other app) and then somehow managed to inject the token in 
the request. If LinkedIn just uses that token for calling the API and it reasons, “Okay, as long 
as | can use this token to call the API without getting an error, I'll consider the current user 


authenticated”, then LinkedIn would be fooled in creating an authenticated session. 


Another consequence of the fact that access tokens are opaque to clients is that an attacker 
could get a token from a user and somehow inject it in the sign-up operation for a completely 
different user. Once again, LinkedIn wouldn't know better because unless the API being called 
returns information that can be used to identify the calling user, the sheer fact that the API 
call succeeds will not provide any information the client can use to determine that an identity 


swap occurred. 





The attacks that I'm describing are called the Confused Deputy attack, and they are a classic 


shortcoming of piggybacking sign-in operations on top of OAuth2. 


Even more aggravating: with this approach, there is no way to standardize the OAuth2 based 
sign-in flow. In our model scenario, the last mile is a successful call to Gmail APls. If | want to 
apply the same pattern with Facebook, the last mile would be a successful call to the Facebook 
Graph APIs, which are dramatically different from the GMail API. That makes it impossible to 
enshrine this pattern in a single SDK that can be used to implement sign in with every provider 


across the industry, even if they all correctly support OAuth2. 


This is where the main players in the industry once again came together and decided to 
introduce a new specification, called OpenID Connect, which formalizes how to layer signing 
in on top of OAuth2. I'll go into painstakingly fine details about that effort in the rest of the 
book, but in a nutshell, the central point of the approach is the introduction of a new artifact, 
which we call the ID token. The ID token can be issued by an authorization server via all the 
flows OAuth2 defines. OpenID Connect describes how applications can, instead of asking 
for an access token (or alongside access token requests), ask for an ID token. The following 


picture summarizes one of such flows: 
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An ID token is a token meant to be consumed by the client itself, as opposed to being used 
by the client for accessing a resource. The characteristic of the ID token is that it has a fixed 
format that clients can parse and validate. The use of a known format and the fact that the token 
is issued for the client itself means that when a client requests and obtains an ID token, the 
client can inspect and validate it - just like web apps secured via SAML inspected and validated 
SAML tokens. It also means the ability to extract identity information from it, once again, just 
like we learned is common practice with SAML. Those properties are what makes it possible 
to achieve proper signing in using OAuth2. The news introduced by OpenID Connect didn't 
stop there: the new specification introduced new ways of requesting tokens, including one in 
which the ID token can be presented to the client directly via the front channel, between the 
browser and the application. That makes it possible to implement sign in very easily, just like 
we have learned in the SAML case, without having to use secrets and a backside integration 


flow as the canonical OAuth2 API invocation pattern required. 


What we have seen in this chapter can be thought of as a rough timeline for the sequence 
of events that culminated with the creation of OpenID Connect. In the next chapters, we will 


expand on the high level flows described here, going in deep in the details of the protocol. 





AuthO: an Intermediary Keeping Complexity at Bay 


What's the role of AuthO in all this? You can think of AuthO as an intermediary that has all 
the capabilities in terms of protocols to talk to pretty much any application that supports the 


protocols that you support, such as OAuth2, OpenID Connect, SAML, WS Federation. 
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You can simply integrate your application with AuthO, which in a nutshell, is a super authorization 
server, using any of the standard protocol flows we described in this chapter. From that moment 
on, AuthO can take over the authentication function: when it’s time to authenticate, your app 
can redirect users to AuthO and, in turn, AuthO will talk to the different identity providers you 
want to integrate with, in each case using whatever protocol each identity provider requires. If 
the identity providers of choice are using one of the open protocols | mentioned, the integration 


AuthO needs to perform is very easy. But if they are using any proprietary approach, for the 
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application developer, it doesn't matter. Once the app redirects to AuthO, AuthO takes care of 
the integration details. For you, it's just a matter of flipping a switch saying, “I want to talk with 
this particularidentity provider” - the result, mediated by AuthO, will always come in the format 
determined by the open protocol you chose to use for integrating with AuthO. In concrete, that's 


what we meant earlier when we stated that AuthO abstracts away the problem from you. 


In addition, AuthO offers a way of managing the lifecycle of a user. AuthO maintains its own user 
store; it integrates with external user stores and exposes various operations you can perform 
for managing users. For example, you can have multiple accounts sourced from multiple identity 
providers, that accrue to the same account in AuthO and your app. You can normalize the set of 
claims that you receive from different identity providers so that your application doesn't have to 


contain any identity provider specific logic. 


We also provide ways of injecting your own code at authentication time, so that if you want to 
execute custom logic, for example, subscription, or billing, or any functionality which just makes 


sense in your scenario to occur at the same time of authentication, you can easily achieve that. 


You have full control over the experience your users will go through, as AuthO allows you to 
customize every aspect of the authentication UX. AuthO makes it very easy for you to use 
the set of features, mostly by providing you with a dashboard that has a very simple point and 
click interface. You can also use AuthO’s management APIs to achieve programmatic access to 


everything the dashboard does, and more 


That's it for Identity 101. It was a pretty quick whirlwind tour of the last 15 to 20 years of evolution 


in the world of digital identity. In the next chapters, we'll spend a bit more time sweating the details. 





Chapter 2 - OAuth2 and OpenID Connect 


Let's dig a bit deeper, and specifically turn our attention to OAuth and OpenID Connect (OIDC) 


as protocols. 


Have you ever read any of the specifications of those protocols? | am an old hand at this: | 
was working in this space when there was still CORBA, WS-Trust, and various other old man's 
protocols. In the past, identity protocols tended to be extraordinarily complicated: they were 
XML-based, and exhibiting high assurance features that made them hard to understand and 
implement. For example, the cryptography they used supported what was called message-based 
security - granting the ability to achieve secure communications even on plain HTTP. It was an 
interesting property, but it came at the cost of really intricate message formatting rules that made 


implementation costs prohibitive for everyone but the biggest industry players. 


Now, the new crop of protocols, OAuth, OpenID Connect, and similar, are based on simple HTTP 
and JSON - a reasonably simple format - and they heavily rely on the fact that everything occurs 
on secure channels. This simple assumption enormously simplifies things: together with other 


simplifications and cuts, this makes the new protocols more approachable and at least readable. 


However, we are not exactly talking about Harry Potter. Ploughing through eighty-six pages of 
intensely technical language, such as the ones constituting the OpenID Connect Core specification, 
is a pretty big endeavor, even for committed professionals. If you work in the identity space, 
you'll find yourself referring to the specifications in detail, over and over again, with a lawyer-like 
focus, on each and every single word - those documents are dense with meaning. You can also 


see that the specifications have a pretty high cyclomatic complexity. That's to say: there are 
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multiple links that provide context and, usually, there is not a lot of redundancy. If there is a link 
pointing to another specification defining a concept used in the current document, you've got to 
follow the link and actually learn about that concept before you can make any further progress. 
There's really a very large number of such specifications, even if you limit the scope to just one 
or two hops from the code OpenID Connect and OAuth2 core specs. All the specifications that 
you see in the constellation of OAuth, and OpenlD, and JWT, and JWS, and similar are the core, 
describing the most fundamental aspects that come into play when handling the main scenarios 
those specifications are meant to address. There is an entire ring of best practices or new 


capabilities not shown here. The complete picture is, in fact, much larger. 
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The main reason for which |am showing you this is to dispel the notion, which a lot of people really 
like to believe, that adding identity capabilities to one application is just a matter of reading the 
spec. If you want to do modern identity, just read the OAuth2 and OpenID Connect specifications, 
and you'll be fine. Of course, the reality is quite different. If that would be true, then not a lot of 


people would be doing modern authentication nowadays. 


In fact, reading all these things is our job, as identity professionals - as the ones who build identity 
services, SDKs, quick starts, samples, and guides that developers can use for getting their job 
done without necessarily having to be bogged down in the fine-grained details of the underlying 
protocols. That said, given that the book you are reading is meant to be read by aspiring identity 
professionals, the fine-grained details of the protocol are among the things we want to learn 


about - and what you'll find in abundance in the rest of the text. 


However, | dislike the classic academic approach so common in other learning material about 
identity. There you just get the lecture and a laundry list of the concepts listed in these various 
specifications - college style - and expected to figure out on your own how they apply to your 
scenarios. The messages, artifacts, and practices defined in those specifications are all there 
for specific reasons. Typically, it is for addressing use cases and scenarios. It's just that their 
language is such that it's not presented, usually, in a scenario-based approach, as it would not 
be economical in a specification to do so. That's a great approach for formal descriptions and 
keeping ambiguity to a minimum, but not great for actually understanding how to apply things 


in concrete. 


I'm going to turn things around, and actually, apart from giving you some basic definitions, | want 
to operate at the scenario level. | want you to understand why things are the way they are and 


how they are applied in particular solutions rather than just ask you to study for a test. In the 
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process, we will eventually end up covering all the main actors and all the main elements in the 
specifications. Simply, we will not be following the traditional order in which those artifacts are 
listed in the specs themselves. We'll just follow the order dictated by the jobs to be done that 


we want to tackle. 


OAuth2 Roles 


Let's start with the few definitions that | mentioned we need before starting our scenario-based 
journey through the specifications. OAuth2 and OpenID Connect define a number of primitives 


that are required for describing what's going on during identity transactions. 


In particular, OAuth2 introduces several canonical roles that different actors can play in the context 


of an identity transaction. As OpenID Connect is built on OAuth2, it inherits those roles as well. 


¥ The first one is the resource owner. The resource owner is, quite simply, the user. Think of 
the LinkedIn and Gmail scenario in the preceding chapter: the resource LinkedIn wants to 


access is the user's Gmail inbox; hence the user in the scenario is the resource owner. 


J Then we have the resource server, which is the guardian of the resource, the gatekeeper 
that you need to clear in order to obtain access. It typically is an API. In our model scenario, 
the resource server is whatever protects the API that LinkedIn calls for enumerating contacts 


and sending emails with Gmail on behalf of the resource owner. 


J Then, there is the client, probably the entity that is most salient for developers. The client, 
from the OAuth2 perspective, is the application that needs to obtain access to the resource. 


In our example, that would be the LinkedIn web application. 
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For OAuth2, which is a delegated authorization protocol and a resource access 
protocol, every application is modeled as a client. However, we'll see that when we 
start layering things on top of OAuth2, and for example, we'll use OpenID Connect for 
signing in, very often what, according to the spec jargon, is called the client will, in 
fact, be the resource that we want to access. In that sentence, | use “resource” not 
in the OAuth sense, but in the general English language sense of the world. You can 


see how naming “client” the resource you want to gate access to might be confusing! 


Now that you have seen in Chapter 1 how OpenID Connect was built on top of 
OAuth2 scenarios, you know why. That's because in OpenID Connect signing in 
means requesting an ID token, which is a special semantic access token meant to 
be consumed by the requestor itself, rather than for accessing an external resource. 
Your application is both the client (because it requests the IDtoken) and the resource 
itself (because it consumes it instead of using it for calling an API), but the term 
we end up using for describing the app in protocol terms is just client. That can 
be confusing for the non-initiated, but that's the way it is. | will often highlight this 


discrepancy throughout the book. 


AS Finally, we have the authorization server, which, as defined in Chapter 1, Introduction to 
Digital Identity, is the collection of endpoints used for driving the delegated authentication 


scenarios described there (and many more). 


The authorization server exposes the authorization endpoint, which is the place where users 


go to for anything entailing interactivity. Practically speaking, the authorization endpoint serves 
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back web pages. It's not always literally the case, as we'll see in the chapter about SPA, but the 


cases in which we don't show a UI on the authorization point are an exception. 


The authorization server also features a token endpoint, that is the endpoint to which apps typ- 


ically speak to in programmatic fashion, performing the operation that actually retrieves tokens. 


Authorization and token endpoints are defined in OAuth2 Core. OpenID Connect augments 
those with the discovery endpoint. This is a standard endpoint that advertises, in a machine - 
consumable format, the capabilities of the authorization server. For example, it will list information 
like the addresses of the two endpoints that | just described. Another essential information the 
discovery endpoint provides is the key that OIDC clients should use for validating tokens issued 


by this particular authorization server, and so on, and so forth. 


OAuth2 Grants and OIDC Flows 


The most complicated things in the context of OAuth2 and OpenID Connect are usually what 
we Call the grants. In a nutshell: grants are just the set of steps a client uses for obtaining some 
kind of credential from the authorization server, for the purpose of accessing a resource. As 
simple as that. OAuth2 defines a large number of grants because each of them makes the best 
of the ability of a different client type to connect to the authorization server in their own ways, 
according to its peculiar security guarantees. Grants also serve the purpose of addressing 
different scenarios, such as scenarios where access is performed on behalf of the user vs. via 


privileges assigned to the client itself and many more. 


| won't go into details of the various grants here because we are going to pretty much look at 


all of them inside out through this book. Suffice to say at this point that there is a core set of 
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grants originally defined by OAuth2: Authorization Code, Implicit, Resource Owner Credentials, 
Client Credentials, and Refresh Token. OpenID Connect introduces a new One, the Hybrid, 


which is combining two particular OAuth2 grants into one single flow. 


In addition to the grants defined by the code OAuth2 and OpenID Connect specifications, the 
OAuth2 working group at IETF and the OpenID Foundation continuously produce independent 
extensions, devised to address scenarios that weren't originally contemplated by the core 
specs, or deemed too specific for inclusion. The ability to add new specifications to extend 
and specialize the core spec is a powerful mechanism, which helps the community to receive 


the guidance it needs to address new scenarios as they arise. 


The book will examine every essential grant in details, with a particular emphasis of the 
scenarios for which a specific grant is most appropriate, the reasons behind the main features 
characterizing every grant, and the most important factors that need to be taken into account 


when choosing to solve a scenario with a specific grant. 





Chapter 3 - Web Sign-In 


Starting with this chapter, we are going to dive deeper into concrete scenarios. Let's begin with 


the most common one: Web Sign-In. 


Confidential Clients 


Before | actually get into the mechanics of it, | have to make a couple of high-level introductions 
of artifacts and terminology that we use in the context of OAuth2 in OpenID Connect. In particular, 


| want to talk to you about client types. 


A confidential client in OAuth2 is a client that has the ability to prove its own application programmatic 
identity. It's any application to which the authorization server can assign a credential of some type 
- that makes it possible for the app to prove to the authorization server its identity as a registered 


client during any request. 


This typically happens with any app that is a singleton. Think of a website that is running on a 
certain set of machines. Even if executing on a cluster, it's one logical entity running there. When 
| provision my client by registering it at the authorization server, | have a clear identity for it. | 
have URLs that determine where this client lives, and | have a flow for getting whatever secret 


we want to agree upon, which | can save and protect locally. 


Allegedly, if the application is running on a server, the server administrator is the only person 
that can access that secret. Contrast all of this with applications that, for example, run on your 


device: those apps are all but a singleton. Every phone will have a different instance of Slack, for 
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example. When you download the application from the application store, there is no easy way 


for you to get a unique key that would represent that particular instance of a client. 


You certainly cannot embed such key in the code, because it would be de-compiled in a second- 
and you'd be in trouble. Also, the device is always available in the pockets of the people using it. 
It is outside of your control, so there is no way for you to protect the key for an extended period 
of time. A motivated hacker has an infinite time to actually dig into the device, as opposed to a 


server that needs first to be breached before it can reveal its secrets. 


In summary, confidential clients are clients for which it's appropriate to assign a secret. The 


classic scenario is websites that run with a server. 


But you can also think of an loT scenario, in which you want to identify the device itself rather 


than the user of a device. 
Another scenario involves long-running processes. 


For example, consider a continuous integration system that uses your Jenkins and compiles 
your product overnight, runs tests, and similar long-running tasks. It's likely that you'll want that 
daemon to run with its own identity, as opposed to the identity of a user. In fact, if you use the 
identity of a user, and then the user leaves the company, it may happen that everything grinds 
to a halt, and no one knows why. This happens because very often people forget that a particular 
user identity was used for running these scripts. So, assigning its own identity to the daemon is 


a better option. 


One subtlety here is that even if an application is a confidential client, not every single grant that 
the application does will require the use of a client credential. It is a capability that the application 


has, but it doesn't have to exercise it every time. There will be, in fact, scenarios, like the one 
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that we are about to explore, in which there is no need to use keys. Typically, the key is used for 
proving your identity as a client when you're asking for a token for accessing a different resource. 


Instead, we'll see that in the case of Web Sign-In, you are the resource. 


The Implicit Grant with form_post 


The grant that we're going to use here is the implicit grant with form_post. It is kind of a mouthful, 
but, unfortunately, that's the way the protocol defines it. This is something that wasn't possible 
before OpenID Connect. It is the easiest way to achieve Web Sign-In using OpenID Connect and 
it is really similar to SAML. In fact, it basically follows the same steps that I've described when | 


demonstrated the first SAML flow in the first chapter, Introduction to Digital Identity. 


This grant constitutes the basis of something that only OpenID Connect can do, that is combining 
signing-in in a website with granting that website with delegated permission to access an API. 
What we are going to do now is to study half of that transaction. We'll only look at the sign-in part. 
When we will talk about APIs, we'll look at the other half. Those two halves can be combined so 
that the experience for the user is truly streamlined. Also, in terms of design, combining sign in and 
API invocation capabilities makes it possible for an application to play multiple roles. This is a really 


powerful scenario that wasn't possible before OpenID Connect. 


Given that we're using the front channel, we don't need to use the application credentials. We 


see that there are security implications here and there, but, as just said, it is just like SAML. 


Setting this thing up from a developer perspective is a thing of beauty. You just install your 
middleware in front of your application. Then, you use your configuration to point it to the 


discovery endpoint, as we mentioned in Chapter 2, OpenID Connect and OAuth, and just specify 
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the identifier that you were assigned as a client when you registered your application. In the 
authorization server, you need to specify the address where you want to get tokens back to the 


app, and you’ve done. 


A detailed walkthrough 


Let's see in detail how the implicit grant with form_post works. Take a look at the scenario shown 


by Figure 3.1: 


We have a user with a browser, a web application protected by a middleware implementing 


OpenID Connect, and an authorization server. 


You might notice that, in this authorization server, I'm showing only the authorization endpoint 
and the discovery endpoint. | don't show the token endpoint because, in this particular flow, we 


don't use it. 


The idea is that, as soon as this web application comes alive, the middleware will reach out 
to the discovery endpoint and will learn everything it needs about the authorization server. In 
particular, it will get the address of the authorization endpoint and the key to be used for checking 
signatures. We'll show how all those steps occur in detail later on (see Metadata and Discovery 


sectionref). For now, we'll focus on the authentication phase proper. 
Let's see how the access plays out by describing each numbered step. 


1. Request Protected Route on Web App 
In the first step, the browser reaches out to the application to get one particular route- 


which happens to be protected hence not accessible by anonymous requests. 
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2; Authorization Request Redirect 
The middleware intercepts this call and emits an authorization request for the authorization 
server in response. The HTTP response has an HTTP 302 status code, i.e. it's a redirect, 
and has a number of parameters meant to communicate to the authorization server all the 


information necessary to perform the required authentication operation. 


the identifier of the app the authorization 
at the AS endpoint 


(2) 302 HTTP/1.1 | 





Location: https://flosser.auth@.com/authorize how | want the 
; ?client_id=ZuGSLZ6HjGRA8LMtopHBzcKHhCXF tMk8& if d 
what artifact(s)| want —_______| ___ &response_type=id_token&response_mode=form_post artifacts returne 





&scope=openid%2@profile%2@email 


&nonce=c 7a6fd988dd 


where | want to receive why | want the artifacts 
the results back (what content, capabilities) 











Figure 3.2 


It’s really important to understand the anatomy of this message since all the other messages that 


we'll see will be a derivative of this. Here, we're going to touch on all the most relevant parameters. 


JY Authorization endpoint. The first element is the authorization endpoint. That's the address 


where we expect the authorization endpoint functionality to be for the authorization server. 


Jf Client ID. This client_id parameter is the identifier of your application at the authorization 
server. The authorization server has a bundle of configuration settings associated with your 


app, and it will bring those up in focus when it receives this particular client ID. 
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J Response type. The response_type parameter indicates the artifact that | want. In this 
particular case, | want to sign in, so | need an ID token. Consequently, the value of the 
response_type parameter will be id_token. There is a large variety of artifacts that | can ask 


for. | can also ask for combinations of artifacts: we'll see those combinations in detail. 


J Response mode. Response mode is the way in which | want these artifacts to be returned to 
me. | have all the choices that HTTP affords me. | can get things in the query string, but this 
is usually a bad idea because artifacts end up in the browser history. | can get the artifacts 
in a fragment, which is still part of the URL but not transmitted to a server. | can get them as 
a form post (form_post), which is what we are using here. In this case, we just want to make 
sure that we post the token to our client. This way, we don't place stuff in the query string, 
which, as mentioned, is generally a bad practice, from the security perspective. The use of 
a POST also allows us to have large tokens. In fact, if you would place stuff anywhere but in 


a form post, then you might run into size limitations. 


JV Redirect URI. The redirect_uri parameter has a very important role. It represents the address 
in my application, where | expect tokens and artifacts to be returned to. | need to specify 


this because the tokens that we use in this context are what we call bearer tokens. 


Bearer tokens are tokens that can be used just by owning them. In other words, | can use it 
directly, without needing to do anything else, like other types of tokens might require. For 
example, other types of tokens may require me also to know a key and use it at the same 
time. But bearer tokens don't. You will hear much more about bearer tokens in the section 
about token validation (see Principles of Token Validation). So, it is imperative that | use only 


HTTPS so that no intermediary can interject itself and intercept traffic. 





Also, it is very important that | specify the exact address | want the response to be sent back 
to. If | don’t and, for example, instead of doing a strict match with the address they provide, 
| allow callers to attach further parameters, | put communication security at risk. What might 
happen - and it did happen in the past - is that there might be flaws in the development stack 
I'm using that will cause my request to be redirected elsewhere. That would mean shipping 
to malicious actors my bearer tokens, and that’s all they’d need to impersonate me. OAuth2 
and OpenID Connect are strict about this: the redirect URI that you specify in the request 


has to be an exact match of what you want. 


/ Scope. The scope parameter represents the reason for which I'm asking for the artifacts. In 
the example above, | specified openid, profile, and email, which are scopes that cause the 
authorization server to issue an ID token with a particular layout. It's somewhat redundant 
with the earlier response type, but I’m also asking for enriching this ID token with profile and 


email information of the user if present. 


In short, with the scope, | am specifying the reason for which | want the artifacts | am 
requesting. We will see that, when we will use APIs, we'll be asking for particular delegated 


permissions we want to acquire. 


¥ Nonce. The nonce parameter is mostly a trick for preventing token injection. At request time, 
| generate a unique identifier, and | save it somewhere (like in a cookie). This identifier is 
sent to the authorization server, and eventually, the ID token that | receive back will have a 
claim containing the same identifier. At that point, I'll be able to compare that claim with the 
identifier that | saved, and I'll be confident that the token | received is the one | requested. If 
| receive a token that has a different (or no) identifier, | have to conclude that the response 


has been forged and the token injected. 


©.9 AuthO 





It is worth mentioning that | specified form_post as the value for response_mode because the 
default response mode of ID token would be different (it would have been fragment); hence | had to 
override it explicitly. The following table shows the default response mode for each response type 
defined by OAuth2 and OpenID Connect. If | omit resoonse_mode in the request, the authorization 


server will apply its default value. 


3. Authorization Request 
The next step for the browser is to honor the 302 redirection and actually perform a GET 


hitting the authorization endpoint with all the parameters | just described. 


From now on, the authorization server does whatever it deems necessary to authenticate 
a user and to prompt for consent. How this occurs isn't specified by OAuth2 or OpenID 
Connect. The mechanics of user authentication, credentials gathering, and the like are a 
completely private matter of the authorization server, as long as the eventual response 
that comes back is in the format dictated by the standard. You can have multi-factor 
authentication, multiple pages, one single page. It doesn't matter, as long as you come 


out with a standard result. 
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4. Authorization Response 
Once everything works out, you get an HTTP response with a 200 status code. This means 
that you have successfully authenticated with the authorization server. The authorization 
server will set a cookie that represents your session with it. So, if later on you need to hit 
the authorization endpoint again, you will not have to enter credentials to sign in explicitly. 
You might have to give more consent, for example, but you shouldn't have to re-enter 


credentials. 


The other important part to note here is the ID token, which is what we requested. It is 
being returned as a parameter in the form post that we are getting. You can see in the 
body of HTML being returned, that the JavaScript onload event is wired up to submit a 


form automatically. 


5. | Send the Token to the Application 
As soon as the page returned by the authorization server gets rendered, it's going to post 
the form to our application. This means that the requested ID token is finally sent to my 


web application. 


6. Token Validation and Web App Sessions Creation 
What happens now is pretty much the same thing that we studied earlier in the web sign- 
on scenario in the first chapter, Introduction to Digital Identity. The application receives the 
ID token and decides whether it likes it or not according to all the various trust rules, and 
what it has learned from the discovery endpoint. If it likes it, the app will emit an HTTP 
302 response with its own cookie. Thanks to that cookie, representing an authenticated 


session with my app, | will not need to get the ID token again as long as the cookie is valid. 
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Together with the cookie creation, the app emits an HTTP 302 response, which redirects 


the browser to the original route it requested. 


7. Request Protected Route with Authorization 
As the browser honors the redirect, we end up where we started: we are requesting a 


protected route, but this time we present a session cookie with it. 


If you compare the original request in 1 with this redirect, you will discover that it is exactly 


the same request but with a cookie coming along. 


8. Access the Protected Route 
Finally, after this long back-and-forth, we can get our response, which is an HTTP 200 


response with a page in the body. 


From now on, every subsequent request toward the application will carry the session 


cookie, proving that there is an authenticated session in place. 


Anatomy of an ID Token 


As we said earlier, the ID token is an artifact proving that a successful authentication occurred. 
We have two ways of requesting it: using a response_type parameter with the id_token value 


and using a scope parameter with the openid value. 


The reason for which we have two mechanisms is that the authors of the specifications wanted 
to be able to use OpenID Connect even if your SDK was only based on OAuth2. In fact, at the 
OAuth2 time, there was no ID token in the enumeration of a response type. Since scopes are 


completely generic as a parameter, then the ability to use one particular scope that would cause 
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the authorization server to return an ID token was a great way of being backward-compatible. 


Today, it's a great way of getting confused, but now that you know, you no longer run this risk. 


OpenlD Connect defines the ID token as a fixed format, the JSON Web Token (JWT) format. The 
specification actually defines not just the format but the list of claims that must be present in an 
ID token. In addition, it even tells you in normative terms what you need to do in order to validate 
some of those claims. As we said, if | include a profile or email value in the scopes of my request, 


| will cause the content of the ID token to look different. 


Just to get a feeling of it, here you can see what you would normally see on the wire: 





eyJ@eXAL01JKV1QiLCJhbGcid0iJSUZI1INiIsImtpZCI6I1JqRXdPRVUxTURJd1 JrTXdRVFE@UVVSR1 JUSTBOVEZETWt VMk4wWk JNamN6ULRZMVJFUXdPQSJ9 
.eyJuaWNrbmFtZSl6Impvce2UrMTIzIiwibmFtZS16Impvc2UrmMT1IzQGF1dGgwLmNvbSIsInBpY3R1cmUi0i JodHRwezovL3MuZ3 JhdmF@YXIuY29tL2F2YXR 
hei9mMZU4NGU5Mj g4ZDdj YWQyMTVmZ jQ2ZmE3ZTIO@ZThiYz9ZPTQ4MCZyPXBnJmQ9aHROCHM 1MOE LMKY1MkZjZG4uYXVG@aDAuY29tJTIGY XZhdGFycyUyRmp 
vLnBuZyIsInVwZGFOZWRfFYXQi0i IT yMDE4LTASLTI@VDE50jUx0jM@L jQwOFoiLCJ1bWF pbC16Impvc2UrMTIzZQGF1dGgwLmNvbSIsImVtYW1sX3Z1cmlmaWV 
kIj pmYWxzZSwiaXNzIjoiaHR@cHM6L y9mbG9zc2VyLmF1dGgwLmNvbS8iLCJzdWIi0iJhdXRoMHw1YmE1NTJKNj cONZE3Y j IwZTUyZ jU2Y2QiLCJhdWQidiJ 
adUdTTHo2SGpHUKE4TE 1 @b3BIQnpjS@hoQ1hGdE1rOCIsIm1hdCI6MTUZNzgxODcyMiwiZXhw1 joxNTM30DU@NzIyLCJub25jZSI6ImNmZWM50TdhNmZkOTg 
wZGQifQ. ngwlT3mv130rv1YYZ@2xYhhE- 

QGTtf3M_gsJGPtKGSkqhEFWVWHst ZzWMJE6Zj AapHYitKQg@zDpEbqixaB6PEqz/be@9SwLeotZSUBGnfQHUSgC8cpXfMP1TkhyEcERz9yOx03TbVJDGFoEJZ 
eu 
—To7BGA3v1vEn7XWHD@0z45Ht2xtklISfW4vXno_ahZMbLufOJFkpJqtUvHMmd9hVy33uZXp_Z7Vggfk_LDD58XKaJJ8Z9WhPUr 1 RFL14IPTNEmtmgSEWXz 
ds6GYA-Ap50H2NWIKZe59eDgqi64GPhhj K@u8qSUAue60Qa7M_yw817sJA9yKHdg5mZ14piTCA 











Figure 3.3 


That’s what a JWT token normally looks like, with its Base64 encoded components. If you go to 
jwt.io, which is a very handy utility offered by AuthO, you can actually paste the bits of your ID 


token and see it automatically decoded. The following picture shows an example of such decoding: 
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€ C | & jwtio w 


JUUT Debugger Crafted by © Autho 


Encoded Decoded 


HEADER 
eyJ@eXAiOiJKV1QiLCJhbGci0iJIUzI1NilsImtp 
ZCI61I1JqRXdPRVUxTURJd1 Jr TXdRVFEBUVVSR1 JU ‘ 


typ JwT", 
STBOVEZETWtVMk4wWk JNamN6U1LRZMVJFUXdPQSJ9 alg": “HS256 
.eyJuaWNrbmFtZSI6Impvc2UrMTIzliwibmFtZSI kid 


RJ EWOEU IMD I wRkMwOQTQ4QURGRT I ONTFOMKU2NGZBM)jCZRTY1REQwOA 
6Impvc2UrMTIzZQGF1dGgwLmNvbSIsInBpY3R1cmU } 


i0iJodHRwezovL3MuZ3 JhdmF@YXIuY29tL2F2YXR 

hei9mMZU4NGU5Mj g4ZDdj YWQyMTVmZjQ2ZmE3ZTI PAYLOAD: 
@ZThHiYZ9ZPTQ4MCZyPXBnJmQ9aHREcGQIMGBE LMmY 

1MkZjZG4uYXV@aDAUY29tJTJGYXZhdGFycyUyRmp ickname: “jose+123 
vLnBuZyIsInVwZGF@ZWRfYXQiOilyMDE4LTASLTI es joanr1Ea0auths com 
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Figure 3.4 


You can see on the right side that we have a header that describes the shape of this specific 
JWT. In particular, by examining the header content, we find that this token is in JWT format, 
what algorithm has been used for signing it and a reference to the key required to validate the 
signature, which in this case corresponds to the key that we downloaded from the discovery 


endpoint (more on that in a moment). 


If you look at the payload, you'll find that it contains the actual information we were expecting to 


retrieve. Going in more details, we have: 


J Theissuer (iss), which is a string representing the source of the token, that is the entity behind 


the authorization server - like the key, also found via the discovery endpoint. 





¥ The audience (aud), which represents the particular application which the token has been 
issued for. It is very important to check this claim. As an app receives this token, the middleware 
used for validating it will compare what was configured to be the app identifier (in the case 
of sign-in and ID tokens, that will correspond to the client ID of the app) with the audience 
claim. If there is a mismatch, that means that someone stole a token from somewhere else, 


and they're trying to trick the app into accepting it. 


JZ Theissued-at (iat) and expiration (exp) are coordinates that are used for evaluating whether 
this token is still within its validity window or if, being expired, it can no longer be accepted. 
We'll see during the API discussion that access tokens and ID tokens typically have a limited 


validity time. 


J All the other claims are pretty much identity information about the user, which are present in 


the ID token only because | asked for profile and email in the scope parameter. 


Principles of Token Validation 


We've been talking about validating tokens quite a lot, relying on the intuition that it entails 
validating signatures and performing metadata discovery. Let's explore the matter in more detail, 


and have a more organic discussion about what it means to validate tokens. 


We have seen the function that tokens perform in a couple of scenarios. We have seen signing 
in with SAML. We have seen access tokens for calling APIs, and in particular, right now, we have 
seen how to use an ID token for signing in. All those scenarios entail an entity, the resource, to 
receive a token and make a decision about whether it entitles the caller to perform whatever 


operation the caller is attempting. How does the resource take that decision? 
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Subject Confirmation 


The subject confirmation is a concept we inherit from SAML. In particular, the subject confirma- 
tion method determines the way in which a resource decides whether a token has been used 


correctly or not. 


Bearer is the simplest. It is similar to finding 20 dollars on the floor. You pick up the money, go 
wherever you want to use this money, use it, and you're going to get the good or service you 
are paying for. No further questions will be asked because all it takes for using 20 dollars is to 
own those 20 dollars and for them to change hands. That's the substance of the bearer subject 
confirmation method. If you have the bits of a token in your possession, you are entitled to use 


the token. 


Proof of possession is something more advanced. In proof of possession, you have a token that 
contains a key of some kind in some encrypted section. This encryption is specifically done for 
the intended recipient of the token. The idea is that when a client obtains such a token, they 
also receive a separate session key, the same key embedded in the encrypted section of the 
token. When the client sends a message to the intended recipient, it attaches the token as in the 


bearer case, but it also uses this session key to do something - like signing part of a message 


When the resource receives the token and the message, they will validate the token in the usual 
way as we described for bearer. That done, they will extract the session key from the portion that 
was encrypted for them. They'll use the session key to validate the signature in the message. If 
the validation works, the recipient will know for certain that the caller is the original requestor 
that obtained the token in the first place. Otherwise, they would not have been able to use 


the session key. 
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This mechanism is more secure than the bearer: an attacker intercepting the message would be 
able to replay the token, but without knowledge of the session key, they would not be able to 


perform the additional signature and provide proof of possession. 


Today nobody substantially is using proof of possession in OAuth2 or OpenID Connect. But proof 
of possession is now coming back. There is a specification, still in draft, which shows how to use 
the mechanism | just described in OAuth2 and OpenID Connect, but it is not mainstream at all. 


That specification is not yet an approved standard. 


So, to all intent and purposes, you can think of Bearer tokens as being the law of the land. There 
is another concept - the sender constraint - but I'll talk more about it when we deal with native 


clients (Chapter 5, Desktop and Mobile Apps). 


Format Driven Validation Checks 


In OAuth2, access tokens have no format. The standard doesn't specify any format mostly because 
originally, it was thought for a scenario where the authorization server and the resource server 


are co-located, and they can share memory. 


Think, for example, of the scenario we described in the first chapter, where Gmail is the resource 


server with its own APIs, and it's also the authorization server. 


In that particular scenario, those two entities can share memory. They can have, for example, 
a shared database. So, when a client asks for an access token, this access token can be just 
an opaque string that happens to be the primary key in a specific table where the authorization 


server saved the consent granted by the user to the client. 
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When the client makes a call to the resource server presenting this token, the resource server 
grabs the token and just uses it for finding in the database the correct row and in it the consented 


permissions. The resource server uses that information for making an authorization decision. 


This scenario is compliant with the spirit of the spec - and also the letter of the spec - and we 


didn't need to mandate any specific format. 


However, in the case of OpenID Connect, we did define a format for the ID token. We expected 
the receiver actually to look inside a token and perform validation steps. This happens typically 
when the resource server and the authorization server are not co-located, hence cannot use 
shared memory to communicate. In those cases, you typically (but not always) rely on an agreed- 


upon format. 
Also, in the SAML case, we defined a format, a set of instructions on how to encode a token. 


In the case of format-driven validation checks, there are certain constraints which apply pretty 


much to every format, and in particular, to JWT: 


¥ Signature for integrity. Your token is signed, and we have seen the reasons for which we want 
to sign a token: being sure of the token origin and preventing tampering in transit. The token 
must provide some indication about the key and the algorithm used in order for its recipient to 


be able to check its signature. 


J Infrastructural claims. Token formats will typically include infrastructural claims, meant to 
provide information that the token recipient must validate to determine whether the incoming 
token should be accepted. One notable example of those claim types is the issuer, which is to 
say the identifier of the entity that issued (and signed) the token, and that should correspond to 


one of the issuers trusted by the intended recipient. Another common infrastructural claim, the 
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audience, says for whom a token is meant to. You need audience to have a way of validating that 
the token is actually for a specific recipient. You also need expiration times claims: tokens 


have typically restricted validity so that there is the opportunity to revoke them. 


Those are all claims that you would expect tokens to have and that the middleware is typically 


on point to validate. 


Alternative Validation Strategy: Introspection 


There is a different way of validating tokens, which goes under the name of introspection. With 
this approach, the resource receiving a token considers it opaque. It may happen because it 
doesn’t have the capability to validate the token. It should be rare in the JWT case because 
checking a JWT is pretty trivial, and it can be done in any dev stack. However, imagine that for 
some reason, you cannot assume that incoming tokens are in a format that you know how to 


validate. 


You can take the incoming token and send it to the introspection endpoint, which is an additional 
endpoint that can be exposed by authorization servers. Given that you connect to the introspection 
endpoint using HTTPS, you can actually validate the identity of the server itself. You can be 
confident that you are sending the token where it's meant to go, as opposed to a malicious site. 
The authorization server examines the token, determines whether that token is valid or not, and 


if it is valid, send down the same channel the content of the token itself (e.g.claims). 


In a nutshell, the resource server sends back tokens to the authorization server saying, "Please 
tell me whether it's valid or not." The authorization server can render a decision and send it back 


to the client, along with the content of the token, so that the resource can peek inside. 
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Personally, I'm not crazy about introspection, mostly because it's brittle. You need to have the 
authorization server up and available, and if your application is very chatty, you might get throttled, 
for example. Also, with this approach, you need to wait until you have one extra network round trip 
before you can actually make an access control decision about the resource that you're calling. 


You might run out of outgoing HTTP connections, which typically live in a pool. It's a lot of work. 


Sometimes there are no alternatives. But in general, for AuthO, given that we always use JWTs 


and public cryptography, normally, it's just better if you validate your own token at your API. 


Metadata and Discovery 


The way in which token validation middleware discovers the values expected in valid tokens 
is through the discovery endpoint. The middleware simply hits the URL ./well-known/openid- 
configuration, which is defined by OpenID Connect, and retrieves validation information according 


to the specification. 


The document published at this URL typically contains direct information that we need to have, 
like the issuer value, the addresses of our authorization endpoint, and similar. It also connects 
to a different file that contains the actual keys, which could be literally the bits of X.509 public 


key certificates. 


Let’s take a look at how middleware extracts validation information from the discovery endpoint 


by following the numbered steps in Figure 3.5. 
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iF Request Configuration 
At load time or even the first time that you receive a message, the middleware reaches 


out to the discovery endpoint. 


That’s a simple matter of making an HTTP GET request to the ./well-known/openid- 


configuration endpoint of the authorization server. 


2. Receive Configuration Document 
What you get back is a big JSON document with all the values required to validate incoming 


tokens. 


For example, just to highlight some of these values, you have the address of the 
authorization endpoint (authorization_endpoint), the value of the issuer (issuer), which is 
the value that we are supposed to validate against, a list of claims which are supported 
(claims_supported), the supported response modes (response_modes_supported), and 


a pointer to the file where all keys are kept (jwks_uri). 


3. Request Keys 
The next step would be to actually make a GET request to the address at which the keys 


are published. 


4. Receive Keys 
The result of that request will be another file containing a collection of keys with their 
respective supported algorithm (alg), their identifier (kid), and the bits of the public key. 
The middleware programmatically downloads all of that stuff and keeps it ready. 
Those keys will occasionally roll, because it's good practice to change them. Your 


middleware will simply have to reach out and re-download these keys when it happens. 
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Chapter 4 - Calling an API from a Web App 


In this chapter, we move our attention to calling APIs. This is the quintessential scenario addressed 


by OAuth 2.0: delegated access to API is the main reason for which OAuth came to begin with. 


Most of the discussion will focus on the canonical grant OAuth 2.0 offers to address the delegated 
API access scenario, the Authorization Code grant. We'll also take a look at other grants, such 
as the Hybrid flow and the Client Credentials grant, which can be used to call API in slightly 


different scenarios. 


The Authorization Code Grant 


At a high level, the way we typically invoke an API from a web application is roughly the same 


way we'd call an API from any client flavor. Details will differ, as we will see throughout the book. 


Depending on the client’s flavor, we'll use different grants, with different properties. In particular, 
in this chapter, we want to focus on the scenarios in which a web application calls an API from 
its server-side code. To that purpose, we use the OAuth 2.0 authorization code grant. The 
authorization code grant, code grant from now on for brevity, empowers one web application to 
access an API on behalf of a user and within the boundaries of what the user granted consent 


for. This is the grant we encountered when introducing OAuth 2.0 in Chapter 1. 


In section Layering Sign In on Top of OAuth2: OpenID Connect of Chapter 1, we've seen that 


some people tried to stretch this grant to achieve sign in, as opposed to invoking an API. In the 


same section, we have seen how if you just use this grant to obtain and use access tokens for 
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signing in, things don’t work out that well. We have seen how the OpenID Connect is layered 
on top of this grant to achieve signin the right way, and we'll have more considerations about it 
in this chapter. At this point, | just want to stress that what we are looking at in this chapter is 


aimed at calling APls, and not at signing in. 


Another important concept to grok upfront is that the code grant will only empower an 
application to do up to as much as the user can already do and no more. If anything, 
the application will usually end up having fewer access rights. Users cannot use the 
code grant for granting application access to the resources the users themselves 
don't own or have the rights for. When thinking about OAuth 2.0 and the code grant, 
in particular, it's easy for people to get confused. They observe that APIs grant 
access to a call depending on the presence of scopes in the token. That lends to 
the credence that the scopes themselves are what grants the client the privileges 
to access the resource. Actually, the scopes select what privileges the user already 


has and is delegating to the client. 


| just want to stress that the authorization code grant is a delegated flow. It allows 
clients to do things on the user's behalf, which means that the user's capabilities are a 
hard limit for what an application can do on the user's behalf. In other words, a client 
obtaining a token via code grant cannot do more than the user can do. If you need a 
client to do more than the user can do, which is acommon scenario, then you need to 
switch to a different flow in which permissions are granted directly to the application 
that needs it, with no user involvement. Clear as mud? Don’t worry. We'll revisit those 


points later in the chapter. 





In the last chapter, we explored how to perform web sign in through the front channel, which 
afforded us the luxury to implement the full scenario without any secrets. As you witnessed in the 
detailed descriptions of flows and network traces, no secret came into play. In the authorization 
code grant, however, the use of an application credential such as a client secret is inevitable. 
Whenever the web app redeems an authorization code, it needs to authenticate as a client to the 
authorization server. The way in which we will approach the delegated API invocation scenario 
will vary depending on whether one needs to access the APIs only while a user is present and 
currently signed in in the application, or whether one needs to acquire permanent access to the 


APIs and perform calls to these APIs even when no user is present. 


My favorite example is an application that can publish tweets at an arbitrary time. Personally, | 
don't like to wake up early in the morning: | really hate it. Nonetheless, it turns out that tweets 
get the best exposure when they come out pretty early. The fact that I’m based in the West 
Coast makes things even worse: if | have to publish tweets manually at a time that should be 
considered morning in the entire North America, I'd have to wake up real early. Luckily, there 
are applications | can use for tweeting on my behalf at whatever time | schedule beforehand. 
Those applications are a typical example of a client that needs to have an access token always 
available for calling the Twitter API on my behalf, regardless of the fact that |am currently signed 
in an active session or blissfully still asleep. This is one of the classic scenarios, offline access, 
demonstrating the need and intended usage of a very important artifact - the refresh token. 


Once again, we'll explore this scenario in detail in this chapter. 


Without further ado, let's dive into the details of the authorization code grant with the help of 


the diagram in Figure 4.1. 
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The diagram depicts the usual actors we encountered in Chapter 2: 


J# On the far left, the user and their browser 


JS The authorization server, on top. Note that this time both the authorization and the token 


endpoints are present in the picture, as both will come into play 
J Aweb application roughly in the middle 
JZ The API the web app needs to call as part of our scenario 


Just like we did during the first explanation of the OAuth2 flow in Chapter 1, section Delegated 
Authorization: OAuth2, here we assume that the user is already signed in with the web application. 
We don't know how that sign-in operation occurred, and we don't care in this context - the API 
invocation operation can be performed independently of the sign-in (although we will later see, 
in the section on hybrid flow, that there are potential synergies there). Let’s examine the message 


sequence in detail. 


1 Route Request 
The user hits a route of the web application that, in our sample scenario, allows the user to 
book an appointment. Booking an appointment happens to require accessing the booking 
API on behalf of the user; hence, accessing that route causes the web app to generate a 


request for delegated access. 


Note, if you compare the equivalent step in the flow described in Chapter 3, section 
The Implicit Grant with form_post for the sign-in operation, you will notice that the web 
app does not have a middleware in front to intercept the route request. In this case, the 


route isn’t the asset we want to protect requesting that route just happens to be the thing 
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that triggers the need to acquire a token to call an API. The logic necessary to generate 
the associated delegated authorization request is, in fact, inside the app codebase itself 


(although it will often be implemented by an SDK, rather than from scratch). 


2. Authorization Request 
The reaction from the application to the request is somewhat familiar: a 302 HTTP status 
code response with a message for the authorization server. You can see a number of differ- 


ences with the equivalent step 2 in section The Implicit Grant with form_post of Chapter 3. 


First, we are setting a cookie to track the nonce value (see Chapter 3, section Authorization 
Request Redirect for more details), as besides the access token needed for accessing 
the API, we'll also be asking for an ID token. The ID token is useful in this flow, Knowing a 
bit more about the transaction, given that the access token itself is opaque to the client. 


More details later in this chapter. 
Next, in the captured trace message, we have the authorization endpoint. 


Ignoring the audience parameter for a second, the next entry is the client_id - representing 


the client ID identifying the web app at the authorization server. 


The response_type for this particular grant is code. We want to obtain a code from the 
authorization endpoint, which the web app will later exchange via token endpoint for an 


access token. 


We don't need to specify the response mode because we are okay with a default response 
mode, which in the case of code response type is query - meaning that we expect the 


authorization server to return the authorization code in a query string parameter. 
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Next, we find the scope parameter. This message includes all the same scope values 
encountered earlier, openid, profile, and email, indicating that we require an ID token 
alongside the code. This time, however, we aren’t requesting an ID token for sign-in 
purposes: we just want to have some information about who is the resource owner grant- 
ing permission in this transaction. Without an ID token, that is to say, something the client 
itself can consume, we would have no way to know. We'd just blindly get an access token 


and use it, with no indication about the identity of the user who obtained it. 


The scopes collection includes a scope value we didn’t encounter yet, read:appointment. 
That scope value represents a permission exposed by the API we want to invoke: in other 
words, one of the things that can be done when using that particular API, and that can 
be gated by an authorization check. By presenting that scope value in the authorization 
request, the client is saying to the authorization server: “This web application wants to 
exercise the read:appointment privilege on behalf of the user”. That's something that the 
authorization server needs to know. It will determine important details in the way the 
request is handled, such as the content of the consent prompt presented to the user and 


the actual outcome in granting the delegated permissions. 
The next parameter represents the redirect URI, which you are already familiar with. 


The last parameter in the captured message is the nonce, a token injection prevention 


mechanism we already encountered earlier in the book. 


Now that we covered every message parameter in detail, let’s revisit the audience 
parameter. When requesting an access token for an API protected by Auth0O, a client is 
required to specify one extra parameter, called audience, indicating the identity of the 


resource the client is requesting access to. 
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The core OAuth2 specification does not contain any parameter performing this function, 
mostly because there is an underlying assumption (though not a requirement) that resource 
server and authorization server are co-located. This assumption makes it unnecessary 
to identify which resource server the request refers to. For a concrete example of this 
scenario, consider how Facebook uses OAuth2 for gating access to its Graph API. The 
Facebook authorization server can only issue access tokens for the Facebook Graph API; 
there is no other resource server in the picture. The only latitude left to clients is to specify 
different scopes for that one resource server, the Facebook Graph. Different scopes will 
express different permissions and operations | intend to exercise, but they will all refer to 
the same resource server, which doesn’t need to be explicitly named in the authorization 
request. Similar considerations hold for Google, Dropbox, and other popular services: 
whenever clients get tokens from those services, they are always calling the provider's own 
APIs, whose identity results self-evigent from the context without requiring an identifier in 
the request.When the solution includes a 3rd party authorization server, like in the case 
of an AuthO customer leveraging the AuthO authorization server to secure its own custom 
API, the topology makes it possible for the same authorization server to be used to gate 
access for a multitude of resources, which can all live in different places. In that scenario, 


the client does need the ability to specify which resource it intends to request access to. 


There are multiple ways a message could be constructed to include explicit references 
to a particular resource server. For example, an API might embed a resource server 
identifier in individual scope strings themselves. However, the approach has issues: scope 
strings could get really long and hard to read. Also, including multiple scopes referring to 
different resources in the same request might generate ambiguity about which resources 


the resulting access token could be used with. 
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Given those complications, AuthO and other identity vendors decided to introduce a 
dedicated parameter for identifying resources. Azure AD, for example, has a resource 


parameter whose semantic is equivalent to AuthO’s audience. 


Since those individual vendor decisions have been made, the IETF OAuth2 working group 
officially recognized the usefulness of such primitives and issued a new specification, 
OQAuth2 Resource Indicators. This specification extends OAuth2 with a resource parameter, 
which is, to all intent and purposes, equivalent to AuthO’s audience. We plan to start 


accepting those standard parameters too in a future update. 


302 Redirect Execution 
Next, the browser executes the 302 HTTP status code redirection sending the message 


we examined on its way toward the authorization endpoint. 


Authorization Response 

Upon receiving the authorization request, the authorization server takes care of the 
interactive portion of the flow.The authorization endpoint decides what's necessary for 
authenticating the user and goes through it. Then presents them with a consent prompt 
saying: "Hey, client X wants to read appointments on your behalf." The moment in which 
the user grants consent, the authorization endpoint returns its response with the requested 
authorization code in the query string, in accordance with the response_type we asked for. 
Also, the response includes the usual set-cookie command with which the authorization 


server records in the browser that an authentication session has been established. 





5. Providing the Authorization Code to the Web App 
At this point, the browser simply executes the redirect that will dispatch the authorization 
code to the web application. From this moment on, the web application will continue the 


flow on the server side. 


6. Redeeming the Authorization Code 
The web application combines the authorization code with its own client credential and 
sends them in a message to the token endpoint.The message to the token endpoint is in the 
form of an HTTP POST request where the app presents its client_id and client_secret, the 
authorization code received from the front channel, and a new parameter, the grant_type. 


The message layout is shown, annotated, in Figure 4.2. 


the identifier of 
the app at the AS N the token endpoint 




















POST https: //flosser.auth@.com/oauth/token JHTTP/1.1 
{ 
. “client_id“ : “xHGI52zgwY@nuxtfSQelaFAwxxHUMB_”, 
the app credentials | “client_secret" : “CgBf@AQBC[, SNIP, ]D1ZMxWxk3ZA6bh”, 
at the AS "code" : “AgZ_tUwVI_gL1AGb”, the grant | want the 
“grant_type” : “authorization_code”, AS to perform 
“redirect_uri” : “http://localhost :3000/callback” 
} 
the artifact to redeem according where the app received the artifact 
to the required grant to redeem 
Figure 4.2 





Every time an application talks to the token endpoint, it has to specify the desired grant 
type letting the authorization server know how to interpret the request. In this particular 
case, the desired flow is the authorization_code grant. That tells the authorization server 
to search for an authorization code in the message, and to consider the client ID and secret 
in the context of this specific grant. If, for example, the request would have specified 
client_ credentials as the grant type, a flow we'll discuss later on, then the authorization 
server would have ignored the authorization code, would have looked only at the client 
ID and client secret and would have considered only the identity of the client application 
itself rather than the consent options of the resource owner implied by the authorization 
code. In other words, the grant_type parameter is used to disambiguate the flow the client 


expects the authorization server to perform. 


The request also includes the audience for the reasons stated earlier. In this particular 
case, audience is redundant. The authorization code has been granted in the context of 
that audience, and the authorization server knows it - hence there’s no need to provide 
it again in this request. However, some extra clarity can be beneficial: for example, this 
helps to interpret what this request is for while examining a network trace, without the 


need to correlate it with the earlier messages that led to this point. 


Finally, the message contains a redirect_uri parameter. In this phase, the authorization 
server doesn't really have any opportunity of performing redirects, given that the client 
is talking to the authorization server via a direct channel. Rather, the redirect_uri is used 
as a security measure to prevent redirection URI manipulation - the authorization server 
will verify that the redirect_uri presented here is identical to the one provided during the 


authorization code request leg of the flow, preventing an attacker from performing URI 


replacement.(see https://tools.ietf.org/html/rfc6749#section-10.6). 
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7. Receiving the Access Token in the Token Endpoint Response 
Assuming that the request is accepted by the authorization server and processed without 
issues, the grant concludes with a response message carrying the artifact originally 
indicated by the response_type in step 2 - Authorization Request, in this case, an access 


token. Here’s a breakdown of the response message content: 
J The requested access token 
~# An ID token, in response to the presence of the openid in the list of requested scope values 


J The token type, which is always Bearer for the time being - as discussed in the token validation 


section 


JZ The expires_in parameter, expressing the time through which the access token should be 
considered valid. Although at times the access token itself might contain that information, and 
happen to be in a format that can be inspected, access tokens should always be treated as 
opaque by clients. As such, expires_in needs to be provided as a parameter in the response 
for the client to be able to use that information (for example, for deciding for how long an 


access token should be cached). 


Important: access tokens should always be assumed and treated as opaque by client 
applications because their content and format are a private matter between the authorization 
server and the resource server. The terms of the agreement between the authorization 
server and the resource server can change at any time: if the client app contains code 
that relies on the ability to parse the access token content, even minor changes will break 


that code - often without recourse. 





Imagine a case in which access tokens, initially sent in the clear, start being encrypted in 
a way that only the intended resource recipient can decrypt. Any client will lose access 
to the token’s content. Client code relying on the ability to access the token content will 
irremediably break.In summary, avoid logic in client applications that inspects the content 
of access tokens. Note, examining the content of a token in a network trace is perfectly 
fine for troubleshooting purposes, as the information will be consumed via debugging 


tools, without generating code that can break in the future. 


8. | Using the Access Token to Call the API 
Once the client obtains the requested access token, it can finally invoke the API: all it needs 
to do is to include the access token bits in a classic REST call. In this particular example, 
the callis a GET, but any REST invocation style is possible. The key feature in that message 
is the Authorization HTTP header, exhibiting the Bearer authentication schema, carrying 


the bits of the access token. 


The OAuth2 Bearer Token Usage specification, the document describing how to use bearer 
tokens obtained through OAuth2 for accessing resources, says that it's possible to place 
the token elsewhere in the outgoing request, for example in the body of a call - or even 
a request link, as a query parameter. Encountering clients that send tokens in the body is 
very rare. The use of the query string for sending access tokens is actively discouraged, 
as it has important security downsides. Consider the case in which your client is running 
in a browser: whenever a token is included in the query string, its bits will end up in the 
browser history. Any attack that can dump the browser history will also expose the token. 
Moreover, if the API call is immediately followed by a redirect, the query string will be 
available to the redirect destination host in the referral header: once again, that will expose 


the token outside of the normal client-resource exchanges. 
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For those and other reasons, it is reasonable to expect that the near totality of the API calls 


encountered in the wild that rely on OAuth2 will use the Authorization HTTP header. 


Authorization Code Grant and PKCE 

The latest OAuth2 Security Best Current Practice (BCP) documents suggest that every 
authorization code flow should leverage Proof Key for Code Exchange (RFC 7636), an extension 
to the authorization code grant meant to protect authorization code from being stolen in transit. 
PKCE was originally devised for public clients, where it performs essential security functions that 
we'll describe in detail in the next chapter. Its use for confidential clients is not as critical, as there 
are other measures already in place (state, nonce checks) mitigating other aspects coming into 
play in associated attacks. This is why we have chosen to keep this section light and to defer 
introducing PKCE in the next chapter, as you will be more familiar with the original grants, and 
it will be easier to add PKCE as an incremental step. However, we wanted to make a note of the 


BCP guidance already here, so that if you read about it elsewhere you'll Know what it is all about. 


Sidebar: Essential Authorization Concepts and Terminology 


OAuth 2.0 offers a delegated authorization framework. Unfortunately, developers often disregard 
the “delegated” part, and attempt using OAuth primitives and flows to solve pure authorization 
scenarios that the protocol hasn't been explicitly designed to address. The outcome is solutions 
that might appear to work in toy scenarios but fall short as soon as the approach is applied in 


more realistic settings. 


For that reason, it is a worthwhile investment to spend a few paragraphs discussing essential 


concepts and terminology in authorization, spelling out explicitly their relationship with OAuth - 
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and in particular, what is part of OAuth and what is instead a property of the underlying resources 


we are exposing. 


Permissions 

Imagine that you want to expose programmatic access to an existing resource. Depending on 
the nature of the resource, there will be varying sets of operations that can be performed on it, 
or with it. In the context of a document editing system, users will be able to see, read, comment 
on, or modify documents. An API facading a printer might expose the ability to print in black and 
white or in color. Any kind of resource will have a set of permissions that make sense for that 
particular resource, and that can be allowed or denied for a particular caller. A permission is just 
that, a statement describing the type of things that can be done with a resource: document.read, 


document.write, print:bw, print:color, mail:read, mail:send, and so on. 


Permissions describe intrinsic properties of resources, which exist regardless of how those 
resources are exposed. OAuth2 solutions might surface them if they happen to be useful in the 
context of a delegated authorization solution involving those resources. Still, in the general case, 


permissions exist in their own right and will be used outside of OAuth as well. 


Priviledges 
A privilege is an assigned permission: it declares that a certain principal (say, John) can perform 


a certain operation on a given resource (say, calling the printer API to print in full color). 


As it was the case for permissions, the concept of privilege exists independently of OAuth 2 
(or any other higher-level protocol, for that matter). For example, the framework necessary to 
describe privileges needs primitives for principals (users and apps to whom permissions might 


be assigned), which OAuth 2 does not define. 
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The existence of permissions and privileges applied to a set of resources will influence the 
behavior of OAuth 2 solutions based on those resources, but how that will happen is not described 


directly in the protocol and messages defined in the OAuth 2 specification. 


Scopes 

Finally, we get to talk about an OAuth primitive. In the case in which a resource needs to be 
exposed in the context of a delegated authorization solution, the scope is the primitive that 
enables a client application to request exercising the privilege of a user for a particular permission 
for a given resource. The mechanism that the client uses for expressing this to the authorization 
server is by including in an authorization request the scopes corresponding to the permissions 
being requested. When used with this semantic - that is, lists of permissions for a given resource 
- scopes are used to define the subset of user privileges that a client application wants to 
exercise on behalf of the user. Note that the scopes can be used for other purposes: we have 
seen examples of that in the case of openid (requesting the presence of an extra artifact, in that 


case, the ID token) or profile, email (influencing returned content). 


Effective Permissions 

Consider a classic delegated authorization flow in which a client requests to the authorization 
server to access a resource. In particular, the client specifies what permissions will be required for 
the operations it intends to perform on the resource. Upon receiving the request and authenticating 
the user, the authorization server will typically prompt the user to grant the app delegated access 
to the corresponding permissions. The user granting consent through that prompt is effectively 
saying "Yes, I'm okay with this particular client exercising on my behalf the privileges being 


requested”. 
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Say, for example, that the client implements an email solution, and the permission it requests is 
mail.read. The scope requested is mail.read and the access token being returned will include (by 


value or by reference, depending on the format) mail.read. 


Once the client obtains the access token, it will use it to make a call to the API, requesting to read 
a list of email messages. The middleware protecting the API, upon receiving and validating the 
access token, will verify that the scope it carries includes mail.read, the permission required by 


the API to perform the read operation requested, and allow the request to move along. 


But the authorization checks aren’t over yet! Imagine that the client requests the list of emails 
from the inbox of a user who's different from the user who granted consent and obtained the 
access token. Should the API allow the request to succeed? Of course not! Scopes do not create 
privileges where there are none. Scopes can grant to a client a subset of the privileges a resource 
owner has on a resource but can never add privileges the resource owner didn’t have to begin 
with. The effective permissions are the intersection of the privileges a resource owner has 
and the scopes that have been granted to the client. The effective permissions represent what 
a client can actually do, and that can be a subset of what’s declared in the scopes. You always 
need to check at runtime whether the scopes represent something that the resource owner 
can actually do for the resource being accessed. Also, note that there is no guarantee that the 
privileges the resource owner had at the moment of granting consent will be preserved forever. 
Hence, even if your authorization server conflates scopes and privileges (for example, by only 
allowing a user to consent if he or she possesses the corresponding privileges), nothing prevents 
some of those privileges from being revoked at a later time. This makes it necessary for the API 


to check rather than just relying on the scopes in the incoming access token. 


This is one subtle point that is often misunderstood in the context of OAuth. 
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Note that OAuth can also be used for application to application flows, in which no user is involved. 
The client obtains an access token for a resource from the authorization server only through 
its own client credentials, as opposed to requesting access on behalf of a resource owner. You 
could say that in those scenarios, the client application itself is the resource owner: there is no 
delegation, hence there’s no need for scopes to limit the privileges involved. We will study the 
corresponding OAuth 2 grant, the client credentials grant, in a later section in this chapter. In this 
case, it's not completely clear how permissionsare expressed, as the core OAuth 2 specifications 
don’t provide any mechanism to express assigned privileges (though there is a new specification, 
the JWT Profile for OAuth2 Access Tokens, that does introduce some guidance about that). 
Regardless of the implementation details of how those privileges are expressed, this is a case 
in which privileges are actually carried in the token. There might be other cases where the 
authorization server includes user privileges, roles, group memberships, and other authorization 
information in the access token. Those cases are all valid and represent real, important scenarios. 
However, they aren't described by the specifications we are studying in this book, so we will not 


add further details here. 


Finally, consider that although scopes often map to permissions, that is not always the case. 
Remember the openid scope? Its presence in a request just causes an ID token to be included in 
the response from the authorization server. Or think about the profile scope, which, when added 
in a request, causes the ID token to include claims that wouldn’t be present otherwise. So, it's 
easy to make the mapping between permission and scope. Scopes do correspond to permissions 
in many common cases, which might erroneously create the belief that scopes, and permissions 


are the same concepts, but in fact, it’s important to remember that they aren't. 





The Refresh Token Grant 


Let's now go back to grants. | mentioned this in passing earlier on: tokens typically have an 
expiration time. They have an expiration time because a token is caching a number of facts and 


user attributes, and those facts might change after the token has been issued. 


Also, the ability of a client to obtain a token at a given time doesn’t guarantee that the same 
client will be able to reobtain the same token in the future. For example, the resource owner 
might visit the authorization server and revoke consent for that client to obtain tokens with the 
scopes previously granted. This makesthe content of any previously issued tokens obsolete as 
they no longer reflect the current situation. The idea is that by endowing tokens with a short 
duration, we ensure that the client cannot really use them (and hence, the information they 
cache) for too long. Upon token expiration, clients will be forced to call back home and repeat 
a request to obtain a new token. This new request creates the opportunity for the authorization 
server to issue a new token containing up to date information or refusing to issue a new token 
if conditions changed (e.g., the user account has been deleted from the system). The shorter 
the token validity interval, the more up to date the issued information will be. Solutions typically 


seek compromises that balance that with performance and traffic considerations. 


Of course, this brings another challenge, which is: although we do want up to date information, we 
don't want to give users a bad experience to achieve that. The user should be blissfully unaware 
of all the low-level mechanisms unfolding behind the scenes to achieve those updates. We need 
to empower clients to renew tokens in a way that does not impact the user experience. The way 
in which OAuth solved this is by introducing a new artifact, that we call the refresh token, and 


associated grants using it to handle token renewals without displaying prompts. 
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The first step to work with refresh tokens is to request one. The OAuth 2 core specification 
doesn’t define a mechanism to request refresh tokens, leaving the decision to issue one to 
individual authorization servers. However, OpenID Connect does define a mechanism to request 
refresh tokens, and the result is that a large number of OAuth 2 authorization servers adopt that 


mechanism as their main (or even only) way of requesting refresh tokens. 


Let’s revisit the authorization code grant examined in an earlier section and add a few small 


changes, as shown in Figure 4.3. 
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The original message in step 3 carried the list of scope values the client required to request an 
ID token with rich attributes content (openid, profile, email) and the access level required for the 
operations the client intends to perform (read:appointment). The message in step 3 in Figure 
4.3 contains an extra scope value, offline_access. This is a scope value defined in the OpenID 
Connect core specification: its presence in a request asks an authorization server to include a 
refresh token in its token endpoint response, alongside all the other artifacts (in this case, an ID 
token and an access token). In particular, the validity of that refresh token will extend beyond 
the duration of the authentication session within which it has been issued. Don’t worry if that’s 


not very clear for now. We'll expand on what that means later in this section. 


If you observe step 7 in the diagram, you'll see that as expected, the authorization server returns 


a refresh token along with the usual access token and the ID token. 


Now the client has a refresh token in its possession. Let's take a look at how the client uses 
it, and in particular how the refresh token makes it possible to get new access tokens without 
prompting the user again. The entire flow occurs on the server side, as it entails the client (in this 
case, a web app whose code runs on the server) connecting directly to the token endpoint of the 
authorization server. The browser, used to send the request and drive the interactive portions 


of the transaction, is now entirely out of the picture. Follow the numbered steps in Figure 4.4. 
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Figure 4.4 





iF Request Configuration 
The first leg of the grant takes the form of a typical token endpoint request, analogous to 


the code redemption request described earlier in the chapter. 
Examining the request, you'll encounter the following parameters: 
J The usual client_id 


J The client_secret. This is a confidential client, hence requests to the token endpoint require 


the client app to identify itself. 
JZ The new refresh_token parameter, carrying the refresh token bits received earlier. 


JY The grant_type. As mentioned earlier, every request to the token endpoint must specify the 


grant the client intends to use. In this case, the parameter value is refresh_token. 


+ The redirect_uri parameter, included for the same security reasons specified in the code 


redemption flow description. 


2. Refresh Token Response 
The authorization server response returns a new access token, a new ID token (because 
the original request included openid), and the list of scopes that were granted when the 


refresh token was obtained to begin with, in this case, during the authorization code grant. 


The reason the authorization server returns the list of granted scopes is that the client 
might not really know what this particular refresh token was originally granted with, or if 
the conditions at the authorization server changed since its original issuance. Furthermore, 


the client can request a certain list of scopes, but the authorization server can always 
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decide to return a subset of those scopes. In that case, if the authorization server wouldn't 
return the list of scopes that have been granted in the context of this particular refresh 
token redemption, the client would have no way of knowing. Even if it remembered the 


ones originally requested, there would be no guarantee 


that such a list would be accurate. Remember that the client is bound to consider the 
access token as opaque, hence it cannot simply look into the access token to find out.as 


opaque, hence it cannot simply look into the access token to find out. 


In this particular case, the authorization server does not return a new refresh token 
alongside the access and ID tokens. The client is expected to hold on to the refresh token 


bits it received on the first flow and keep using it until expiration. 


There are various scenarios in which the authorization server does include a new refresh 
token at every refresh token grant. The most notable case is in the context of a security 


measure called token rotation. 


Token rotation guarantees that, whenever you use a refresh token, the bits of that particular 
refresh token will no longer work for any future redemption attempts. Every use of a refresh 
token will cause the authorization server to invalidate it and issue a new one, returned 
alongside the refreshed access token. Clients need to be ready to discard old refresh 


tokens and expect to store new ones at every renewal operation. 


Any attempt to use an old refresh token will cause the authorization server to conclude that 
the request originator stole it. That might trigger protective measures, such as invalidating 
all the other tokens that have been created in the same authenticated session, in case 


the leak indicates a compromised application. Note that this measure might be overkill for 
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confidential clients, where use from legitimate clients is enforced by requiring applications 
to use their client_secret when redeeming refresh tokens. However, it is extremely useful 
for public clients, where apps can redeem refresh tokens without the need to exhibit any 
app credentials. More details about this will be discussed in the next chapter on native 


and mobile clients. 


3. Calling the API 
The new access token will be used exactly in the same way as the old one: all the 


considerations about calling API according to the OAuth2 Bearer Token Usage specification 


apply. 


Some Considerations on Refresh Tokens 

The fact that a client requests a refresh token including the scope offline_access signals to the 
authorization server that the resulting refresh tokens lifetime will be decoupled from the lifetime 
of the authenticated user session within which the grant was performed. In other words, whether 
a user is signed in or not signed in with an application via the front channel doesn't really matter 
with respect to whether the same application is able to redeem a refresh token. Also, the fact 
that the app can still use a valid refresh token doesn't say anything about whether there’s an 
active sign-in session for the user that helped obtain that refresh token in the first place. The two 
things are completely separated. The scenario that offline_access is meant to support is the one 
that | described at the beginning of the chapter, where a user wants to schedule a tweet to be 
published at a future time regardless of whether the user will be signed in at that time or otherwise. 
In more general terms, it addresses the case in which an application might be in need to obtain 


a valid access token to invoke an API even if no user is present to tend to interaction requests. 
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One common mistake developers make is to interpret the ability of an application backend to 
redeem a refresh token as proof that the user still has a session. Per the above explanation, this 
is a dangerous mistake that can lead to resurrecting sessions already expired or terminated via 


sign out, making front channel session management ineffective. 


When developing applications that need to invoke APIs even without an active user session, 
the app clearly needs to persist refresh tokens so that they are available independently of the 
presence of an interactive session. Even for cases in which API calls are scoped to the interactive 
session lifetime, tokens need to be saved somewhere other than in memory if you want to spare 
users from having to go through token acquisition flows in case the webserver memory recycles. 
Of course, persisting refresh tokens (and tokens in general) requires caution. It’s important to 
make sure that tokens are stored per user, to prevent the possibility of a user ending up accessing 
and using the refresh tokens associated with another user. That's just the same basic hygiene 
required to enforce session separation, but when it comes to tokens, the need to follow best 
practices is all the more critical given the high impact of identity mix-up and the complications 


that derive from persisting user data beyond the interactive session lifetime. 


To close the topic of refresh tokens for this chapter, here’s a last recommendation. Even if you 
know the expiration time associated with a refresh token, you should still not rely on that in your 
code. There are many reasons for which a refresh token might stop working, regardless of its 
projected expiration. For example, a user could revoke consent, immediately invalidating refresh 
tokens issued on the basis of previous consent. Another example: a resource server might change 
policy and establish that, from that moment on, it will only accept access tokens obtained via 
multi-factor authentication. This renders any refresh token obtained with a single-factor session 
unable to obtain viable access tokens and forces the client to reobtain a new refresh token via 


multi-factor authentication. Again, all this may happen regardless of the declared expiration of 
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the original refresh token. For all those reasons, it is prudent to develop client code assuming 
that a refresh token might stop working at any time and embed appropriate error management 


and remediation logic upfront. 


Sidebar: Access Tokens vs. ID Tokens 


You now had the opportunity to see both access tokens and ID tokens in action. Just as important, 
you learned about the reasons for which both artifacts have been introduced by OAuth 2 and 
OpenID Connect in the first place. It is worth stepping back for a moment and summarizing the 
differences between the two token types, as confusion about when to use what is one of the 


most common challenges you'll encounter as an identity practitioner. 


Access Tokens Recap 
Access tokens are artifacts meant to enable a client application to access a resource, typically 
on behalf of a resource owner bestowing the client application with delegated authorization. As 


discussed, there is no token format mandated by OAuth 2. 


Earlier on, we discussed the implications of the common topology where authorization server 
and resource server are co-located, making it possible for them to access shared memory and 


making using a format for access tokens unnecessary. 


Conversely, consider an authorization server separated from the resource servers, as it is the case 
with identity as a service offering like AuthO, where the same authorization server is shared by 
multiple resource servers owned by different companies. This is a scenario that can really benefit 


from agreeing on a format and using it for validating incoming tokens, even if the protocol doesn't 
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offer anything out of the box. The use of JWT as a format for access tokens is so common that 
there’s a standardization effort currently ongoing to define an interoperable profile for it - which 


| happen to be currently driving. You can find the latest draft (at the time of writing) here. 


At the cost of being pedantic, it should be stressed that, as a client app developer, you should 
never write code that inspects the access token content. The fact that in some cases you might 
know that a specific token format is being used doesn’t change this, as the reasons for which it’s 
not a good idea are more about the contracts between client, resource and authorization server. In 
fact, it will often be happenstance that you have a chance to look inside an access token, and the 
situation might change at any time. The format used in an access token is a matter agreed upon by 
the resource server and the authorization server, and the details can change at any time to their 
discretion without informing the client. Any code predicated on assumptions about the access 
token content will break as soon as those assumptions no longer hold, and on occasions without 
any remediation. Think of information being removed, or the content beingencrypted so that no 
entity, but the access token intended recipient can inspect it. Although during troubleshooting 
it is legitimate for a developer to read whatever information is available, including the content of 
captured tokens, developing code that does so routinely will very often result in downtimes and 


serious production problems. 


ID Token Recap 

ID tokens are designed to support sign-in operations and, optionally, make authentication 
information available to clients. They don’t contain any delegated authorization information 
(though nothing prevents implementers from extending the default claims set described in the 
specifications with their own custom values). ID tokens come into play during user sign-in, and 


clients can use them to learn about what happened during the authentication flow. Whereas 
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clients should really not inspect access tokens, as discussed in detail just a few paragraphs 
earlier, clients must look inside ID tokens - that’s part of the validation step described in the Web 


Sign-In chapter and mandated by the OpenID Connect core specification. 


One of the most common points of confusion about ID tokens is whether they can be used for 
calling APIs. The short answer is that they shouldn't. Let’s invest a few moments to understand 


why people attempt that, and why it’s generally not a good idea. 


ID tokens are designed to support sign-in operations. The client app is simultaneously the 
requestor and the recipient of the ID token: once the token has been received by the client, it 
has reached its intended destination and isn’t meant to travel any farther. All the client needs to 
do with it is to validate it and extract user attributes, when present. Both are operations that can 
be done locally, thanks to the fact that ID tokens have a fixed format, and the OpenID Connect 
specification details how to perform validation. The ultimate proof that the ID token shouldn't 
leave the client app lies in the aud claim, formalizingthat the client app is the intended recipient 


by carrying its client_id value. We have discussed all this in Chapter 3, Anatomy of an ID Token. 


Nonetheless, there are real-world situations in which client apps do use ID tokens for invoking 
API. Often, that is due to designers not fully understanding the underlying protocols, and in 
particular, the role of the audience claim. For them, a JWT is a JWT, and ID token is often easier to 
obtain as it doesn’t require registering APIs, defining scopes, and adapting validation techniques 
to each specific authorization server requirements. For example, some will not use JWT as the 
format for access tokens and will require supporting introspection calls. Some others might not 
be designed to protect 3rd party API at all, hence not offering API registration and access token 


issuance and validation features, but still issue ID tokens for sign-in purposes. 





Nonetheless, in the general case using ID tokens for invoking API has issues. The main problem 
goes to the heart of why we have audiences in the first place. An API receiving an ID token can 
only verify that the token was issued for that particular client: there’s nothing in the token saying 
that it was issued with the intent to call this particular API. Besides the practical issue of being 
unable to insert ad-hoc claims for that particular API, there are serious security concerns: a 
leaked ID token can now be used not just to access the client, but also to invoke this API and all 


the other APIs following the same strategy. 


Whereas properly scoped tokens would contain the blast radius of a leak event (an access token 
scoped to API A can only be used with A), many APIs accepting an ID token means that they would 
all be compromised at once. This also makes it really hard to maintain separation between API: 
if both A and B accept ID tokens, that means that when the client calls A, A can turn around and 
use the same token it received from the client to invoke B. Although that might be acceptable at 


times, in the general case, this should never happen as a side effect. 


Lastly, | will mention that the use of ID tokens for calling APIs cannot be secured by sender 
constraint, asthe protocols supporting it won't provide any mechanism to associate the ID token 


to a channel between client and API. 


For the sake of exhaustiveness, | want to acknowledge a particular situation where the use of 
ID tokens for calling an API might not be disastrous, though it’s never as good as using access 
tokens. Consider the case in which the client app and the API in itself happen to be the same 
logical application. That’s the scenario commonly described as “1st party app”, where both 
ends have the same owner and are tightly coupled to implement a given solution. Think of a 
social network API and its client app, for example. In this case, the solution won't strictly require 


delegation, the incoming token will likely be expected to identify the user, and the tokens issued 
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to that client won’t be accepted by any API other than the 1st party one (if you exclude cases 
where individual app owners decide to accept them anyway, which are outside the control of 


the 1st party solution developer anyway). 


From the end-user perspective, the client+API ensemble constituting the solution is a logical whole 
- my experience of using my Twitter account through the Twitter app doesn’t usually require any 
special consent where the APIs are explicitly called out. In that case, one could argue that the 
component of the app requesting the token and the component implementing APIs are, in fact, 
the same entity, which could be represented by the same identifier - hence, here’s the crucial 


step, targeted by a token with the same audience... just like an ID token. 


Once, in front of a beer, one of the authors of the OpenID Connect specification told me that 
an ID token is just an access token with specialized semantics. That said, it’s still generally not 
worth it to ever use ID tokens for calling APIs. Although narrowly defined 1st party scenarios do 
exist, those would still be better off when implemented with access tokens (think about sender 
constraint limitations mentioned above) and the risk of overreaching and using the ID token in 
ways that expose you to serious security risks is just too great. | mentioned this particular case 
here because you are likely to encounter that approach in the wild if you work in this space 
long enough, and | wanted to empower you to understand the nuances and point of view of 
the people following that approach: however, the best practice remains using access tokens for 
calling APIs. If you need JWT access tokens, the aforementioned JWT profile for OAuth2 access 


tokens is on its way. 





ID Tokens and the Back Channel 


OpenID Connect offers multiple different ways of signing in. The one we studied in the preceding 
chapter leverages the front channel. It relies on the implicit flow (that is, issuing an ID Token 
directly from the authorization endpoint) plus form post (transmitting the token to backend hosted 
logic, as it is the norm for redirect based apps). That flow is just the one that happens to have 
the least number of moving parts, as it doesn’t require the client app to obtain, manage, and 
use a client secret. It also is the flow that has more or less the same security characteristics as 
traditional protocols such as SAML or WS-Federation, which are still in very wide use in mission- 


critical high-value scenarios. 


The authorization code grant we just studied in this chapter for calling the API can and is commonly 
used for performing sign-in operations - by obtaining ID tokens following the same steps we 
studied for requesting an access token. Say that you are in a scenario in which, for some reason, 
you don't want to disclose the bits of the ID token to the user’s browser: by using the authorization 
code grant, you can make everything take place on the server side. You can just perform an 
authorization code grant in the same way we did for getting a token for calling the API: you 
just ask for an ID token as well. Note, that’s exactly what we did in our API calling scenario, by 
including the openid scope in the initial request. All we need for making that operation count as 
sign-in is to validate that ID token and create a front channel session on the base of its content. 
The notable difference from the front channel is that, given that the client obtains the ID token 
from a direct HTTPS connection with the token endpoint, there is no uncertainty about the 
source from which the ID token bits came from. The client knows for certain that the ID token 
comes directly from the authorization server, with no intermediaries that could have tampered 


with the content in transit. And with origin and integrity verified, there is no need to validate the 
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ID token’s signature. Think about it: if you were to validate the signature, you’d use the key you 
retrieved from the discovery document. And why do you trust that it is the right key? Because you 
retrieved the discovery endpoint over an HTTPS direct channel! The same assumptions hold for 
the ID token retrieval from a direct connection with the token endpoint, which is why the client 


can skip the signature verification. 


What's very, very important to understand is that not having to verify the signature does NOT 
mean that the client is allowed to skip token validation! The client is still meant to validate 
audience, issuer, expiration times, and all the other checks that the OpenID Connect specification 
describes for the ID Token validation. The signature is only one of the many checks a recipient 


should perform to validate incoming tokens, even in the front channel case. 


Obtaining an ID token via authorization code is technically more secure than receiving it through the 
front channel. However, this technique is more onerous, as it requires the client to obtain, protect 
and use an application credential - that has a management cost, associated risks (like forgetting 
a secret in source control), performance, and availability challenges (extra server calls). If your 
application only needs to sign-in users and don't have particular constraints about having tokens 
transit through the browser, the front channel technique works fine - as demonstrated by many 
years of successful SAML deployments using similar techniques to protect high-value scenarios. 
If you are indeed in a situation that calls for higher security, or if you are already performing API 
calls requiring the authorization code flow anyway, you might consider implementing sign-in via 


backchannel as described in this section. 





The Userinfo Endpoint 


A client requesting an ID token without specifying the profile and email scope values will receive 
a skeleton token stating that user X (as expressed by an opaque identifier, usually) successfully 
authenticated with issuer Y. The token also specifies the time and perhaps the authentication 


modes, and no other info - in particular, no user attributes. 


There might be multiple reasons for which a client might opt for such barebone ID token content. 
For example, a client might want such a token to use an easy to set up front channel sign-in flow 
while avoiding disclosure of personally identifiable information (PII) to the browser. Alternatively, 
clients might go that route simply to reduce the size of transferred data on a network that doesn't 
have a lot of bandwidth, or on a metered connection where bigger ID tokens might result in the 


user getting charged more for data use. 


The good news is that clients can opt to work with barebone ID tokens and still gain access to 
user attributes when necessary. OpenID Connect introduced a new API endpoint, called Userinfo 
endpoint, which can be used for retrieving information about the user by presenting an appropriate 
access token - following the same OAuth2 bearer token API calling technique studied earlier in 
this chapter. Whenever the client needs to know something about the user, whether it didn’t save 
the initial ID token or received a barebone one, it reaches out to the Userinfo endpoint using a 
previously obtained access token. It will receive what substantially is the content that the client 


would have gotten in an ID token requested with profile and email scopes. 


The first chapter described the evolution that led from OAuth 2 to OpenID Connect. A key 
passage was about a particular way of abusing OAuth for simulating sign-in, where the ability 


to successfully call an API with an access token was considered proof enough for the client to 
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consider a user signed in. That had several problems: access tokens could not be tied to a user 
in particular (very important if you aretrying to authenticate, that is, to sign-in), could not be 
proven to have been issued as part of a sign-in operation for that app in particular, and could not 
be standardized given that every provider protected API of different shape (Facebook Graph, 


Twitter API, etc.). 


The Userinfo endpoint resolves the first and the 3rd problem. The Userinfo response does provide 
information about the user that obtained the access token used to secure the call to begin with 
- and it’s standard, hence generic SDKs can be built to work against it. That makes it possible 


for a client to implement pure OAuth 2.0 to retrieve user information in a standardized fashion. 


It is very important to realize that, however, successfully calling the Userinfo endpoint is NOT 
equivalent to validating ID tokens and alone CANNOT be used to implement sign-in. Calling the 
Userinfo endpoint is not equivalent to validating a token, it does NOT count as sign-in verification. 
Calling the Userinfo endpoint only proves that the corresponding access token is valid and 
associated with the user identity whose attributes are returned: it does NOT prove that the 
access token was issued for that particular client. OpenID Connect sign-in operations ALWAYS 
require validating an ID token, although, as we have seen in some circumstances, the signature 


check can be skipped from the validation checklist. 


Another thing to keep into account when considering using the Userinfo endpoint from a 
confidential client is that all the discussions about the burden of using a secret apply here, as 


that’s part of obtaining an access token. 


After all that preamble, let’s take a look at how an actual call to the Userinfo endpoint takes 
place. As usual, we are going to explain each step - please refer to the numbered messages in 


the diagram in Figure 4.5. 
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Figure 4.5 





iF 


Userinfo Request 
The scenario in the diagram assumes that the client has already obtained a suitable access 
token for calling the Userinfo endpoint. Invoking the Userinfo endpoint is simply an HTTP 


GET request, attaching said access token in an authorization header. 


You might notice that in this particular network trace, the access token value looks different 
from all the other tokens shown in the diagrams so far. Whereas token values in earlier 
diagrams were always clipped for presentation purposes, and their shape suggested the 
classic JWT encoding, the bits on display here are the entirety of the access token and 
don’t appear to follow any known pattern. That's because calling the Userinfo endpoint 
is precisely a scenario in which the use of opaque, formatless tokens makes sense. The 
Userinfo endpoint is co-located with the authorization server: there is no need for cross- 
boundaries communication. The entity that issued the access token in the first place is 
the same entity responsible for validating it during the Userinfo API call. That means that 
the two tasks can access the exact same memory space. In concrete terms, this means 
that the access token intended to access the Userinfo API doesn't need to be encoded 
in any particular format. It can literally be the identifier of a row in a database that was 
created at issuance time and can now be looked up at API invocation time, or any other 


technique relying on shared memory. 


This is a luxury we cannot afford when the API being invoked is managed by a 3rd party and 
hosted elsewhere. In this scenario, the parties involved are forced to rely on token validation 
based on formats, introspection, and in general, techniques meant to accommodate the 


lack of shared memory between the entity issuing the token and the entity consuming it. 





2. Userinfo Response 
The response returned by the Userinfo endpoint contains pretty much the same 
list of claims carried by an ID token obtained via a request that includes the profile 


scope. 


The Hybrid Grant 


The hybrid grant is, as the name suggests, a mix of multiple flows into one. It combines a sign- 
in operation (getting an ID token from the front channel) and obtaining an access token for 
invoking an API from the client backend (by requesting and redeeming an authorization code). 
That saves network round trips, consolidates prompts and consent requests, and is, in general, 
a very efficient way of performing a sign-in operation while getting ready to invoke API at the 
same time. No diagram is shown for the hybrid grant, as you can easily piece it together yourself 
by combining the web sign-in flow diagram in the preceding chapter and the authorization code 
flow shown here. OpenID Connect is unique in this ability to mix and match sign-in and calling 
APIs and having entities playing both roles: a “resource’, as in something being accessed as part 
of the sign-in access, and aclient, consuming other resources such as API. The fact that the app 
in OpenID Connect is always called a client, emphasizing the latter role and omitting the former, 


is a nod to its OAuth 2 origins (and to the fact that “resource” in OAuth 2 is reserved for APIs). 


The hybrid grant is a really powerful tool that is commonly used in applications. In fact, today, it's 
pretty rare to be able to state that an app will forever either only require sign-in, or only call APIs. 
It's usually a continuum, and the availability of this grant makes it easy to add one functionality 
or the other by simply modifying either the implicit plus form_post grant or the authorization 


code grant. 
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Client Credentials Grant 


In the last section of the chapter dedicated to invoking API, we will study the client credentials 
grant - a flow defined by OAuth 2 for the cases where a client needs to get access tokens using 
its own programmatic identity, rather than doing so on behalf of a user. Unlike the grants we 
examined so far, the client credentials grant has no public client variant - it can only be performed 
by a confidential client.All the flows examined so far for API are designed to grant clients delegated 
access to resources, that is to say, enabling clients to “borrow” some of the user's privileges 


when accessing resources. 


There are a number of situations in which clients need to operate as themselves, rather than 
on behalf of a user. These are scenarios in which the application has an identity and has direct 
resource privileges in itself. That class of scenarios doesn’t require a user to be signed in or 
otherwise present. Even if a user happens to be signed in at that time of access, their privileges 
might not be the ones the client needs to exercise. A classic example of that scenario occurs 
when an application needs to perform an operation that the currently signed in user has no 
privilege for. Imagine, for example, a continuous integration (Cl) web app in which the final step 
of a build process is taking the binaries of a compiled product and saving them in a particular 


share that no user has access to. 


One way of working around the problem would be to open the floodgates and give every user the 
permission to access that share. That would preserve the Cl's ability to call the share in delegated 
access mode. However, the risk for abuse would be very high: users might choose to exercise 


their privileges on that file share even outside of the Cl process. 





An alternative would be to give privileges for file share access to the application itself. In turn, 
the application can feature logic that determines which users should be able to write to the 
share. So, it can use its own write privileges to perform write operations only for the appropriate 
user sessions, and only within the limits of what the Cl logic requires. Said in another way, by 
granting the application itself the privileges required to access a resource, the responsibility of 
determining who can do what is transferred from the authorization server to the application itself, 


which becomes the gatekeeper for the resource. 


One common way of referring to the aforementioned pattern is to say that the application and 


the downstream APIs it accesses are defined as a trusted subsystem. 


To use a real-world analogy, consider how a classic amusement park handles visitors’ access. At 
the entrance, a visitor pays for a ticket and is given a bracelet or equivalent visible sign that the 
individual paid for access. This sign does not need to bear any indication of the identity of the 
wearer. Once the guest is in, she can enjoy every ride without any further access control check 


other than the bracelet, broadcasting her right to be on the premises. 


Similarly, once a user signs in with the Cl web app, all the subsequent calls to the downstream 
API will be performed as the web app itself, just in virtue of the fact that the user successfully 
signed in. In a way, you can think of this as a resurgence of the concept of perimeter. However, 
the big difference with traditional network perimeter is that the boundaries here are mostly 
logical (API's willingness to accept tokens issued to the Cl app client) rather than physical (actual 


network boundaries). 


This class of patterns is pretty common in the context of microservices, where there is a gateway 
that validates the identity of a caller. Once that check has been successfully performed, all the 


subsequent calls from the gateway can be performed carrying tokens identifying the calling app 
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rather than the user. The user information might still be required, but it doesn’t strictly need to 


travel in an issued token. 


As it is the case with every confidential client flow, the critical point here is in putting particular 
care in provisioning client credentials and maintaining them: for example, by making sure that 
no entity other than the application has access to its credentials. Another critical aspect of the 
scenario, not explicitly covered by the standards but of vital importance, is to carefully choose the 
privileges assigned to the application and application logic exercising them. The least privilege 


principle remains a key best practice in this scenario. 


Let's take a look at how the client credentials grant actually works on the wire: please refer to 


Figure 4.6. 
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Figure 4.6 





iF Access Token Request 
The client application requests a token by contacting the token endpoint directly, similarly 
to what we have observed in the server-side segments of all the grants we have studied 


so far. 


In the sample scenario we have been discussed so far, the call is performed during a user 
session - however that is entirely arbitrary. Remember that the client credentials grant 
only relies on the client’s own identity rather than requesting delegated authorization from 
a user. So, from the OAuth 2 standpoint, the flow described here might just as well occur 
in a command-line tool, a long-running process, or in general, any kind of application 


executed in a context where distribution and protection of client credentials are possible. 


The request is a customary HTTP POST, carrying the well-known client_id, client_secret, 


and grant_type (this time, set to client_credentials). 


Observing the body of the POST message, one notable difference from all the grants 
encountered so far is that the message for the token endpoint doesn’t contain any artifact 
besides the client_secret. In contrast, the authorization code grant and the refresh code 
grant all included some other entity to redeem. Once again, this shows why the other 
flows are conceivable with public clients as well, whereas the client credential grant isn’t 


possible without, well, client credentials. 


Here it’s opportune to stress that client credentials and the client credentials grant are two 
separate, distinct concepts. Client ID and client secret are the client credentials assigned 
to a confidential client application and are used to identify the client app in every grant 


whenever communication with the token endpoint occurs. The client credential grant is 
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a grant which happens to require only the client credentials, and no other artifact, to be 
performed. It’s easy to get confused when using the terms loosely: whenever you hear 
someone mentioning “client credentials’, it’s useful to be c lear on whether they are talking 


about the grant, or just about the client ID and client secret. 


One last observation on the request message: the audience parameter is required to 
indicate to the authorization server what resource the client is requesting access to. This 
information is necessaryfor authorization servers that can protect multiple source servers; 
hence there’s no default resource the authorization server can refer to. As mentioned in 
our earlier discussions about the audience parameter, the standard way of signaling that 
information to the authorization server is through the resource parameter as defined in the 
resource indicators specification, which was formalized into RFC state only a few months 


ago. At the time of writing, AuthO doesn’t support resource indicators. 


2. Token Response 
The token endpoint response is entirely unsurprising, carrying back the requested access 


token just like described for other grants. 


Of course, there is no id_token, given that the grant didn’t entail user identity in any 


Capacity. 


Notably absent is the refresh token, too. In this scenario, it would simply serve no purpose. 
The refresh token is meant to allow a client app to obtain a new access token to substitute 
an expired one, and to do so without bugging the user with an extra prompt. However, 
there is no need to ask anything to a user here, as the client credentials are available to 


the client app at any time to request a new token. 
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Important note: the mechanism shouldn't be abused. Once a client requests and obtains 
an access token, it should keep it around (stored with all the safety measures the task 
requires) for the duration of its useful lifetime and use it whenever it needs to call an 
API. Discarding still-valid access tokens and requesting a new access token from the 
authorization server every time can be a costly anti-pattern, at all levels: security (every 
time credentials are sent on the wire, there’s an opportunity for something to go wrong), 
performance (network calls), availability (possibility of being throttled, transient network 


failures), and money (various providers charge per issued token). 


Note that, in this particular case, AuthO uses scope to represent what the client can do. 


For what we said earlier about scopes, this is a bit controversial. 


Let's say that scopes normally restrain the set of privileges that the client can use from 
the privilege that the user has, and here there is no user. Even if it appears not quite 
appropriate, that's how AuthO does it today. It just represents the privileges that have been 
granted to the client application. There is no real security risk because of this: if a resource 
owner would interpret the incoming scopes as the delegated authorization concepts we 
discussed so far, the power they’d confer to the caller would be less, not more. However, 


it’s an exception that is important to be aware of. 


3. Calling the API 
As expected, the call to the API occurs as usual, without any dependency on how the client 
obtained the access token being used to protect that call. This completes our journey to 
understand how to leverage OAuth2 and OpenID Connect to invoke APIs from a traditional 


web app, and in general, any confidential client. 


In the next chapter, we'll take a look at native clients, mobile clients and pretty much any 


application that an end-user can directly operate... and that isn’t a browser. 
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Chapter 5 - Desktop and Mobile Apps 


COMING SOON 
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Chapter 6 - Single Page Applications 


COMING SOON 
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